APM 421 Probability Theory Discrete Random Variables. Jay Taylor Fall Jay Taylor (ASU) Fall / 86
|
|
- Carol Barker
- 6 years ago
- Views:
Transcription
1 APM 421 Probability Theory Discrete Random Variables Jay Taylor Fall 2013 Jay Taylor (ASU) Fall / 86
2 Outline 1 Motivation 2 Infinite Sets and Cardinality 3 Countable Additivity 4 Discrete Random Variables 5 Probability Generating Functions 6 Geometric and related distributions 7 Poisson distribution 8 Fluctuation Tests 9 Poisson Processes Jay Taylor (ASU) Fall / 86
3 Motivation Distributions on Infinite Spaces Example: Suppose that a coin with probability p = 1/2 of landing on heads is repeatedly flipped and let N be the number of flips that land on tails before we get the first heads. Assuming that the flips are independent of one another, we expect that P(N = k) = «k for any integer k 0, since the event {N = k} occurs if and only if the first k flips all land on tails and the k + 1 st toss lands on heads. This suggests that we can define N to be a random variable with values in the set of natural numbers N = {0, 1, } and distribution given by the above formula. In particular, notice that if p > 0, then «k+1 1 P(N = k) = = 1, 2 k=0 k=0 which suggests that P(N < ) = 1. Jay Taylor (ASU) Fall / 86
4 Motivation Unfortunately, our current formulation of probability does not allow us to define such a random variable N. The problem is that at the moment we only require probability distributions to be finitely additive, which means that while we can calculate the probabilities of finite sets such as P(N 4) = 4X P(N = k) = k=0 k=0 4X k=0 «k+1 1 = , we cannot similarly conclude that «2k+1 1 P(N is even) = P(N = 2k) = 2 k=0 since [ {N is even} = {N = 2k} k=0 expresses the event that N is even as a disjoint union of infinitely many sets and finite additivity tells us nothing in this case. Jay Taylor (ASU) Fall / 86
5 Motivation One way to address this problem is to define a distribution ν on N by requiring ν(a) = X k A «k for every subset A N. Clearly, ν is coherent: ν(a) 0 for every A N; ν(n) = P k 0 ν({k}) = P ` 1 k+1 k=0 = 1 2 ν is finitely additive, since given any pair of disjoint subsets A, B N, ν(a B) = X k A B 1 2 «k+1 = X k A 1 2 «k+1 + X k B «k+1 1 = ν(a) + ν(b). 2 However, ν is not the only coherent distribution on N which assigns probability (1/2) k+1 to each singleton set {k}. In fact, there are infinitely many such distributions and we don t yet know which one (if any) of these should be chosen as the distribution of N. Jay Taylor (ASU) Fall / 86
6 Motivation The previous example demonstrated that finite additivity on its own may not be strong enough to uniquely define the probabilities of infinite sets. Our next example will reveal an even more serious defect: although sure loss cannot occur if we use a coherent probability distribution to place bets on a finite sample space, this is not true for infinite sample spaces. Example: Let N = {0, 1, 2, } denote the set of natural numbers and for each m 1 and k = 0,, m 1, let R m,k be the set R m,k = {k + nm : n 0} = {k, k + m, k + 2m, k + 3m, } The sets R m,k are called residue classes mod m. For example, R 2,0 is the set of non-negative even integers, while R 2,1 is the set of non-negative odd integers. It can be shown that there exists a coherent distribution µ on N with the following properties: µ(r m,k ) = 1/m for all m 1 and k = 0,, m 1. µ(b) = 0 for any finite subset B N. In particular, µ({n}) = 0 for every n 0. Jay Taylor (ASU) Fall / 86
7 Motivation Since R 1,0 = N, it follows that = µ(n) = n 0{n} [ A X µ({n}) = 0, n 0 but this is no contradiction since we only require µ to be finitely additive. We will say that µ is a uniform distribution on the natural numbers. Although µ is coherent, it has some unsavory properties. Let A be an event to which you assign probability P(A) = 1/2 and let X be the indicator variable for A, i.e., X = 1 if A occurs and X = 0 if A occurs. We will define a second random variable Y with values in N as follows. Given any subset B N define 8 < µ(b) if X = 0 P(Y B X ) = : ν(b) if X = 1, where ν is the distribution defined in the first example and µ is the uniform distribution defined in this example. Jay Taylor (ASU) Fall / 86
8 Motivation Since Y has been defined in terms of X, we can use the law of total probability to calculate the probabilities of events of the form Y B. For example, observe that for every n 0, P(Y = n) = P(Y = n X = 0)P(X = 0) + P(Y = n X = 1)P(X = 1) = µ({n}) ν({n}) 1 2 = 0 1 «n «n+2 1 =, 2 since µ assigns probability 0 to every finite subset of N. Jay Taylor (ASU) Fall / 86
9 Motivation Likewise, we can use Bayes formula to calculate the conditional distribution of X given Y, e.g., P(X = 1 Y = n) = = P(X = 1, Y = n) P(Y = n) P(Y = n X = 1)P(X = 1) P(Y = n) = (1/2)n+1 (1/2) (1/2) n+2 = 1, which also holds for every n 0. In other words, although X is equally likely to be equal to 1 or 0, as soon as we learn the value of Y we can immediately deduce that X = 1, no matter what value Y assumes. Jay Taylor (ASU) Fall / 86
10 Motivation These observations also lead to following consequences for wagers on the event A. In the absence of any information about Y, we are willing to pay $0.50 for a $1 bet that A will occur. However, if we subsequently learn the value of Y, then the value of our $1 wager on A immediately becomes $0 since we are then certain that A will occur. This example illustrates a phenomenon known as dynamic sure loss: by gaining information we guarantee that we will lose money. Notice that we cannot escape this quandry by re-assigning the unconditional probability of A to be 1, since in the absence of information about Y, A is as likely to occur as it is not to occur. Rather the problem is that coherence is not strong enough a condition on distributions on infinite spaces to avoid certain forms of sure loss. Fortunately, we can avoid these dilemmas by requiring that probability distributions on infinite spaces satisfy a stronger set of conditions. Jay Taylor (ASU) Fall / 86
11 Infinite Sets and Cardinality Interlude: Infinite Sets and Cardinality Before we can begin to extend our theory to sets with infinitely many elements, we need to take a closer look at some of the properties of infinite sets. We begin by addressing the following question: what do we mean when we say that two sets, A and B, have the same number of elements? This is easy when A and B are finite. For example, if A = {apples, oranges, pears} B = {87, J, c} then since A and B each contain three elements, it is clear that they both have the same number of elements. In other words, we count the number of elements in each set and check whether these numbers are equal. Jay Taylor (ASU) Fall / 86
12 Infinite Sets and Cardinality To extend this concept further, we need to take a closer look at counting. When we count the number of elements in a set X and decide that this number if n, what we are doing is creating a function Φ from the set {1, 2,, n} into the set X that is both one-to-one and onto: Φ is one-to-one if no two distinct elements are assigned the same value by Φ, i.e., if i j, then Φ(i) Φ(j); Φ is onto if every element in the range is the image of an element in the domain, i.e., for every x X, there is an element i such that Φ(i) = x. A function Φ that is both one-to-one and onto is said to be bijective. In general, there may be many bijections Φ between {1, 2,, n} and X, but one way to select such a function is to label the elements of X = {x 1,, x n} and then define for i = 1,, n. Φ(i) = x i Jay Taylor (ASU) Fall / 86
13 Infinite Sets and Cardinality This way of thinking about counting can also be applied to pairs of sets whose sizes are being compared. Specifically, if X and Y both have n elements, then there are bijective functions Φ (X ) and Φ (Y ) from {1,, n} onto X and Y, respectively. However, this 1 means that there is a bijective function Ψ = Φ (Y ) Φ (X ) from X onto Y. In fact, the converse is also true: if X and Y are finite and there is a bijection between X and Y, then they have the same numbers of elements. For example, a bijection can be constructed between the sets A and B as follows apples 1 87 oranges 2 J pears 3 c which gives us the mapping Ψ : A B with Ψ(apples) = 87, Ψ(oranges) = J and Ψ(pears) = c. Jay Taylor (ASU) Fall / 86
14 Infinite Sets and Cardinality These observations lead us to the following definition. Definition We say that two sets, X and Y, have the same cardinality, written X = Y, if there exists a bijective function Φ between X and Y. In contrast, we say that the cardinality of X is less than the cardinality of Y, written X < Y, if X and Y do not have the same cardinality and there is a subset D Y such that X and D have the same cardinality. Remarks: Interpretation: Cardinality provides us with a way to compare the sizes of different sets. Sets that have the same cardinality have, in some sense, the same number of elements, even if that number is infinite. Cardinality is not the only way to measure the size of an infinite set, but it is one of the most basic notions insofar as it does not require additional structure on the set. Other more specialized notions of size include Lebesgue measure, Hausdorff dimension, and capacity. Jay Taylor (ASU) Fall / 86
15 Infinite Sets and Cardinality Given any two finite sets A and B, we can show that either both sets have the same number of elements or one of the two sets has fewer elements than the other. This is a consequence of the fact that the counting numbers are well-ordered: given positive integers n and m, either n = m or n < m or n > m. However, things are not so straightforward when we turn to infinite sets. In fact, the following two statements are equivalent in the sense that each one implies the other: Law of Trichotomy: Given any two sets X and Y, either X = Y or X < Y or X > Y. Axiom of Choice: Given any collection of distinct, non-empty sets S α, α A, there exists a set C which contains exactly one element from each of the sets S α. Although the axiom of choice is accepted by many mathematicians as one of the fundamental axioms of set theory, it leads to a number of odd results such as the Banach-Tarski paradox, which asserts that it is possible to dissect a three-dimensional ball into a finite number of pieces which can then be reassembled into two disjoint unit balls, each having the same volume as the original. Jay Taylor (ASU) Fall / 86
16 Infinite Sets and Cardinality One of the stranger properties of cardinality is that two sets X and Y can have the same cardinality even when X is a proper subset of Y. Example: The positive integers Z + are a proper subset of the natural numbers N, but the mapping Φ(n) = n + 1 is a bijection from N onto Z + and so both sets have the same cardinality. Example: The even natural numbers 2N are a proper subset of the natural numbers N, but the mapping Φ(n) = 2n is a bijection from N onto 2N and so these sets also have the same cardinality. Hilbert s Paradox of the Grand Hotel: Suppose that a hotel contains an infinite number of rooms, numbered 1, 2, 3,, and that all of the rooms are occupied. A new guest arrives and seeks accommodation. To make a place for them, the hotel moves each guest from their current room to the room with the next higher number, e.g., the person in room 1 moves to room 2, the person in room 2 moves to room 3, and so forth. The new guest is then moved into room 1. In this way, the hotel is able to accommodate new arrivals even when there are no vacancies. Jay Taylor (ASU) Fall / 86
17 Infinite Sets and Cardinality Sets that have the same cardinality as the natural numbers or one of its subsets play an especially important role in probability theory. Definition A set X is said to be countable if X is either finite or X has the same cardinality as the natural numbers N. In the latter case, we say that X is countably infinite. If X is neither finite nor countably infinite, then X is said to be uncountable. The following are examples of countable sets: The natural numbers N = {0, 1, 2, }; The positive integers Z + = {1, 2, }; The integers Z = {0, ±1, ±2, }; The rational numbers Q = {p/q : q 0, p, q Z}; Any countable union of countable sets, i.e., if A i is countable for every i 1, then the union A = i A i is also countable. Jay Taylor (ASU) Fall / 86
18 Infinite Sets and Cardinality As the following theorem shows, not all infinite sets are countably infinite. Theorem The set [0, 1] is uncountable. Proof: We prove this by contradiction, using a clever method invented by Georg Cantor that has come to be known as the Cantor diagonalization argument. If [0, 1] is countable, then there is a bijection Φ between Z + and [0, 1]. Each of the numbers Φ(n) [0, 1] has a decimal expansion which can be written as Φ(n) = 0.c n1c n2c n3. Let x [0, 1] be the number with decimal expansion x = 0.x 1x 2x 3, where x n = 2 whenever c nn 2 and x n = 1 whenever c nn = 2. I claim that there is no integer n > 0 such that Φ(n) = x. Jay Taylor (ASU) Fall / 86
19 Infinite Sets and Cardinality Indeed, if there is an n > 0 such that Φ(n) = x, then we can write the decimal expansion of x in two ways: x = 0.x 1x 2x 3 = 0.c n1c n2c n3. However, since decimal expansions that do not end in a repeating series of all 0 s or all 9 s are unique, it must be the case that x i = c ni for all i 1. In particular, x n = c nn, which is a contradiction since we chose each x i so that x i c ii. This shows that no bijection can exist between Z + and [0, 1], which in turn implies that [0, 1] is uncountably infinite. Remarks: Since Z + has the same cardinality as the set D = {n 1 : n 1} [0, 1], it follows that the cardinality of [0, 1] is strictly larger than that of Z +. In other words, some infinite sets are bigger than others. It can be shown that any interval [a, b] or (a, b) with a < b is uncountably infinite. In particular, the real numbers R are uncountable, as are all of the Euclidean spaces R n. Jay Taylor (ASU) Fall / 86
20 Countable Additivity Countable Additivity To avoid the kinds of difficulties exposed by the examples given at the beginning of these slides, we will require that probability distributions on infinite sets satisfy the following additional condition. Definition Let S be a set and let P(S) be the collection of all subsets of S, i.e., the power set of S. A function µ : P(S) R is said to be countably additive if for any countably infinite collection of disjoint sets A 1, A 2, in F we have µ! [ A i = i=1 µ(a i ). i=1 Notice that if µ is countably additive and µ( ) is finite, then µ( ) = 0. Indeed, if we take A i = for all i 1, then µ( ) = µ( ). i=1 Jay Taylor (ASU) Fall / 86
21 Countable Additivity Theorem Let S be a countably infinite set and suppose that P : P(S) [0, 1] is a countably additive function with P(S) = 1. Then P is coherent. Proof: To show that P is coherent, we need only show that it is finitely additive. Suppose that A 1,, A n is a finite collection of disjoint subsets of S and define A n+k = for every k 1. Then A 1, A 2, extended in this fashion is a countably infinite sequence of disjoint subsets of S and by countable additivity we know that P! n[ A i i=1! [ = P A i = = i=1 P(A i ) i=1 nx P(A i ), i=1 which shows that P is finitely additive. Jay Taylor (ASU) Fall / 86
22 Countable Additivity In view of the previous theorem, we will adopt the following definition for probability distributions on countably infinite sets. Definition Let S = {s 1, s 2, } be a countably infinite set. A probability distribution on S is a function P : P(S) [0, 1] which satisfies the following conditions: 1 P(A) 0 for every subset A S; 2 P(S) = 1; 3 P is countably additive, i.e., if A 1, A 2, is a countable sequence of disjoint subsets of S, then! [ P A i = P(A i ). i=1 i=1 Jay Taylor (ASU) Fall / 86
23 Countable Additivity Because every subset of a countably infinite set is either finite or countably infinite, every probability distribution on a countably infinite set is uniquely determined by the probabilities that it assigns to the individual members of the set. Theorem Suppose that P is a probability distribution on a countably infinite set S = {s 1, s 2, }. Then, for any subset A S, we have P(A) = X s A P({s}). Proof: The result follows from the countable additivity of P and the fact that A can be expressed as a countable disjoint union of singleton sets containing the elements contained in A: A = [ s A{s}. Jay Taylor (ASU) Fall / 86
24 Countable Additivity In particular, this identity leads to an easy method for constructing probability distributions on countably infinite sets. Theorem Let S = {s 1, s 2, } be a countably infinite set and suppose that p 1, p 2, is a sequence of non-negative numbers that sums to 1 If P : P(S) [0, 1] is defined by p i = 1. i=1 then P is a probability distribution on S. P(A) = X s i A p i, Jay Taylor (ASU) Fall / 86
25 Countable Additivity Proof: It is clear from the definition that P(A) 0 for every subset A S and also that P(S) = X s i S p i = p i = 1. i=1 Furthermore, if A 1, A 2, is a countably infinite sequence of disjoint subsets of S, then! [ P A k k=1 which shows that P is countably additive. = = = X s i S k A k X p i p i k=1 s i A k P(A k ), k=1 Jay Taylor (ASU) Fall / 86
26 Discrete Random Variables Having defined probability distributions on countably infinite spaces, we can also extend our definition of a random variable to include variables which can take on countably infinitely many possible values. The following definition is key. Definition A discrete random variable is a random quantity X which takes values in a countable set S = {x 1, x 2, }. (Here S can be finite or countably infinite.) In this case, the distribution of X is the probability distribution on S defined by P(A) = P(X A) for any subset A S. Furthermore, the probability mass function of X is the function p : S [0, 1] defined by for any x S. p(x) = P(X = x) Jay Taylor (ASU) Fall / 86
27 Discrete Random Variables Example: For each n 0, let p n = 2 (n+1). Since p n = 2 (n+1) = 1, n=0 n=0 we can define a probability distribution on the natural numbers N = {0, 1, 2, } by setting P(A) = X n A p n. With this machinery in place, we can also formally define a random variable N which is equal to the number of tails obtained before the first heads when a fair coin is tossed repeatedly. In this case, the probability mass function of N is the function p : N [0, 1] defined by p(n) = 2 (n+1). Jay Taylor (ASU) Fall / 86
28 Discrete Random Variables Example: There is no uniform distribution on the natural numbers. Indeed, a distribution on a set S is said to be uniform if every element in S has the same probability. Thus, if P was uniform on N, then there would exists a non-negative number c 0 such that for every n 0, c = p n = P({n}). However, since N is equal to the countably infinite disjoint union of the singleton sets {n} and we know that P(N) = 1 for any probability distribution on N, the countable additivity of P implies that 1 = P(N) = p n = c and the right-hand side is either 0 if c = 0 or if c > 0. In either case, we have a contradiction and so P cannot be uniform on N. n=0 n=0 Jay Taylor (ASU) Fall / 86
29 Discrete Random Variables Theorem Suppose that X is a discrete random variable with values in the countable set S and let p : S [0, 1] be the probability mass function of S. Then and X p(x) = 1 x S P(X A) = X x A p(x) for any subset A S. Exercise: Prove this theorem. Jay Taylor (ASU) Fall / 86
30 Discrete Random Variables Previously we defined the expected value of a random variable X with finitely many possible values S = {x 1,, x n} to be the weighted sum of these values: E[X ] = nx P(X = x i ) x i. i=1 Although we would like to be able to extend this definition to random variables with countably infinitely many possible values, the following example shows that this is not entirely straightforward. Example: Let X be a random variable with values in the integers Z = {0, ±1, ±2, } and the following probability mass function p : Z [0, 1]: 8 < p(n) = P(X = n) = : 0 if n = 0 1 Cn 2 if n 0. Jay Taylor (ASU) Fall / 86
31 Discrete Random Variables The constant C included in the definition of the probability mass function of X is said to be a normalizing constant and must be chosen so that the probabilities sum to 1: 1 = X n Z p(n) = 2 C n=1 1 n = 2 π 2 2 C 6 = π2 3C. Thus C = 3/π 2 and so X is a properly defined discrete random variable. Now suppose that we define the expectation of X to be E[X ] X n Z P(X = n) n = 1 C = 1 C X n 0 X n 0 n n 2 1 n. Jay Taylor (ASU) Fall / 86
32 Discrete Random Variables Unfortunately, the last expression is ambiguous since its value depends on the order in which the terms are included in the sum. For example, if we first add the positive terms and then the negative terms, then we obtain the difference: X n 0 1 n? = X 1 n + X 1 n n 1 n 1 =, which is undefined. Alternatively, if we group the terms by absolute value and sum in order of increasing magnitude, then we obtain X 1 n + 1 «= X 0 = 0. n n 1 n 1 In fact, given any real number x R, it is possible to order the terms in this series such that the sum is equal to x. This shows that our previous definition of the expectation cannot be automatically extended to variables that take on infinitely many values since the infinite series might not even exist or, if it does, might depend on the order in which we list the possible values. Jay Taylor (ASU) Fall / 86
33 Discrete Random Variables Interlude: Infinite Series We begin by recalling what it means for a sequence of real numbers to converge to a limit. Definition A sequence of real numbers (x n : n 1) is said to converge to the limit x R, written x = lim n xn, if for every ɛ > 0 there exists a positive integer N ɛ such that for every n N ɛ we have x x n < ɛ. Example: The sequence (1/n; n 1) converges to the limit x = 0 since for any n N ɛ = 1 + ɛ 1 we have 0 x n = 0 1/n = 1 n ɛ 1 < 1 ɛ < ɛ. Jay Taylor (ASU) Fall / 86
34 Discrete Random Variables Although any sequence of real numbers (x n; n 1) can be assembled into a formal series, x n, the previous example shows that this sum is not always uniquely defined. For this reason, we need to pick out a special class of series that can be summed. n=1 Definition An infinite series consisting of the terms (x n; n 1) is said to be convergent if the sequence of partial sums s n = x x n is convergent, i.e., if the limit exists. i=1 x i lim n nx i=1 x i Jay Taylor (ASU) Fall / 86
35 Discrete Random Variables Our example also revealed that the value or even the existence of the limit of an infinite series may depend on the order of appearance of the terms in that sequence. This is unacceptable if we wish to use infinite series to define expectations of random variables with countably infinitely many values since the order in which we list these values is completely arbitrary. Fortunately, there is a large class of infinite series that do not suffer from this ambiguity. Definition An infinite series consisting of the terms (x n; n 1) is said to be absolutely convergent if the series ( x n : n ) is convergent, i.e., if the limit n=1 x n = lim n nx x i exists. If the series formed from (x n : n 1) is convergent, but not absolutely convergent, then we say that it is conditionally convergent. i=1 Jay Taylor (ASU) Fall / 86
36 Discrete Random Variables Example: Consider the alternating sequence with terms x n = ( 1) n+1 /n. This sequence is convergent with limit ln(2) = lim n k=1 nx x k, but it is only conditionally convergent since the limit is infinite. lim n k=1 nx x k = k=1 1 k = Jay Taylor (ASU) Fall / 86
37 Discrete Random Variables There is a profound difference between absolutely and conditionally convergent sequences that has a direct impact on our ability to define expectations. This is highlighted by the next two theorems. Theorem Suppose that (x n : n 1) are the terms in an absolute convergent series and let (y n : n 1) be a rearrangement of these terms. Then (y n : n 1) is also absolutely convergent and the limit of the series does not depend on the order in which we sum the terms: x n = y n. n=1 In particular, if (a n : n 1) is the sequence of non-negative values in (x n; n 1) and ( b n : n 1) is the sequence of negative values, both listed in order of appearance, then the series P a n and P b n are both absolutely convergent and n=1 x n = X a n X b n. n 1 n 1 n=1 Jay Taylor (ASU) Fall / 86
38 Discrete Random Variables The previous theorem states that absolutely convergent series are well-behaved in the sense that their limits do not depend on the order in which we sum their terms. The next theorem shows that absolute convergence is also a necessary condition for this to be true. Theorem Suppose that the series P x n is conditionally convergent and let y R be a real number. Then there is a rearrangement of the sequence (x n : n 1), say (y n : n 1), such that the series P y n converges to y, i.e., y = lim n k=1 nx y k. In other words, a conditionally convergent series can be rearranged so that it converges to any limit that we like. Jay Taylor (ASU) Fall / 86
39 Discrete Random Variables These last two theorems lead us to the following definition of the expectation of a random variable that takes on countably infinitely many values. Definition Suppose that X is a random variable with values in a countably infinite set S = {x 1, x 2, } R and let p(x i ) = p i be the probability mass function of X. Then the expectation of X will be defined to be equal to the quantity E[X ] = p k x k = lim k=1 n k=1 nx p k x k provided that the series P p k x k is absolutely convergent, i.e., provided that E[ X ] = p k x k = lim k=1 n k=1 nx p k x k < If this condition is not satisfied, then we say that the expectation of X does not exist. Jay Taylor (ASU) Fall / 86
40 Discrete Random Variables Example: Let N be the random variable with values in the natural numbers N = {0, 1, 2, } and probability mass function p(n) = 2 (n+1). Since N only takes on non-negative values, we need only check that the series P n 0 pnn is convergent. However, this is a consequence of the following calculation: nx 2 (n+1) n = 2 (n+1) 1 n=0 = = = 1 n=0 k=1 n=k k=1 2 n k=1 2 (n+1) Thus the expectation of N exists and is equal to E[N] = 1. Jay Taylor (ASU) Fall / 86
41 Discrete Random Variables Most of the results that we proved about expectations of random variables taking at most finitely many values extend to expectations of random variables taking countably infinitely many values. Here I will prove one such result and state several others (see the text for proofs). Theorem Suppose that X is a random variable having an expectation. Let k and b be constants and define Y = kx + b. Then Y has an expectation and E[Y ] = ke[x ] + b. Proof: Suppose that X takes values in the set S = {x 1, x 2, } and let p i = P(X = x i ). Then E Y = p i kx i + b p i` kxi + b i=1 = k p i x i + p i b i=1 i=1 = k E X + b <. i=1 Jay Taylor (ASU) Fall / 86
42 Discrete Random Variables This calculation shows that E[Y ] exists. Furthermore, its value is E[Y ] = p i`kxi + b i=1 = lim n nx p i`kxi + b i=1 nx = lim k p i x i + b n = k lim n i=1 nx i=1! nx p i i=1 p i x i + b lim n = k p i x i + b i=1 = ke[x ] + b. i=1 p i nx i=1 p i Jay Taylor (ASU) Fall / 86
43 Discrete Random Variables The remaining properties are stated as theorems. Theorem 1 Suppose that X and Y are random variables and that the expectations E[X ] and E[Y ] both exist. Then the expectation of X + Y exists and is equal to E[X + Y ] = E[X ] + E[Y ]. 2 In general, suppose that the expectations of the random variables X 1,, X n exists and let c 1,, c n be constants. Then the expectation of the variable c 1X c nx n exists and is equal to " nx # nx E c i X i = c i E[X i ]. i=1 i=1 In other words, expectations remain linear even when extended to random variables taking countably infinitely many values. Jay Taylor (ASU) Fall / 86
44 Discrete Random Variables Theorem Suppose that X is a random variable with countably infinitely many variables in the set S = {x 1, x 2, } and let g : S R. Then Y = g(x ) has expectation provided that E Y <. E[Y ] = P(X = x i ) g(x i ) i=1 Remark: The existence of E[X ] is not enough to guarantee the existence of E[g(X )]. For example, if P(N = n) = 2 (n+1), then we know that E[N] = 1 exists. However, if g(n) = ( 2) n, then and so E[g(N)] does not exist. E[ g(n) ] = 2 (n+1) ( 2) n 1 = 2 =, n=0 n=0 Jay Taylor (ASU) Fall / 86
45 Discrete Random Variables Theorem Let X and Y be random variables taking at most countably many values and suppose that E[X ] and E[X Y = y i ] exist for all possible values y i of Y. Then the random variable E[X Y ] has an expectation and E[X ] = EˆE[X Y ]. This result is sometimes known as the law of iterated expectations. Theorem Let X and Y be independent random variables and suppose that the expectations of g(x ) and h(y ) exist, where g and h are functions. Then g(x ) and h(y ) are independent random variables and the expectation of g(x )h(y ) exists and is equal to E[g(X )h(y )] = E[g(X )]E[h(Y )]. Jay Taylor (ASU) Fall / 86
46 Probability Generating Functions Probability Generating Functions Definition Let X be a random variable with values in the natural numbers N = {0, 1, }. The probability generating function of X is the function ψ() defined by h i ψ X (t) = E t X = P(X = n) t n for those values of t R such that the series on the right-hand side converges. The set of all such t is called the domain of ψ X. n=0 Remarks: The probability generating function is an alternative way of encoding information about the distribution of a random variable. Our aim is to learn about the distribution by studying the properties of the probability generating function. Jay Taylor (ASU) Fall / 86
47 Probability Generating Functions Example: If X is a binomial random variable with parameters n and p, then the probability generating function of X is ψ X (t) = E ht i X = = = nx P(X = k) t k k=0 nx k=0 nx k=0! n p k (1 p) n k t k k! n (pt) k (1 p) n k k = (pt + 1 p) n, and the domain of ψ X is the entire real line. Jay Taylor (ASU) Fall / 86
48 Probability Generating Functions Because the probability generating function is defined by a power series expansion, its domain is at least as large as the radius of convergence of that series. Recall that the radius of convergence of a power series φ(x) = P c nx n is the largest number of ρ such that the series converges for all x with x < ρ: ( ) ρ = sup r > 0 : c nx n < if x r n=0 There are many methods that can be used to determine the radius of convergence of a power, but one of these is the so-called ratio test. Theorem Let φ(x) = P n cnx n be a power series and suppose that the limit c n ρ lim n c n+1 exists. Then ρ is the radius of convergence of this power series. Jay Taylor (ASU) Fall / 86
49 Probability Generating Functions Example: Let X be a natural number-valued random variable with distribution 8 < 0 if n = 0 P(X = n) = : 6 1 if n 1. π 2 n 2 and let ψ X be the probability generating function of X : ψ X (t) = 6 π 2 X n=0 t n n 2. To apply the ratio test, we calculate the limit c n ρ = lim n c = lim (n + 1) 2 = 1, n+1 n n 2 which shows that the radius of convergence is 1. Since ψ X (1) < and ψ X ( 1) <, the domain of ψ X is [ 1, 1]. Jay Taylor (ASU) Fall / 86
50 Probability Generating Functions An important property of probability generating functions is that they uniquely determine the distribution of a random variable. Theorem Suppose that X and Y are natural number-valued random variables with identical probability generating functions, i.e., h i h i ψ X (t) = E t X = E t Y = ψ Y (t) and both functions have the same domain. Then X and Y have the same distribution, i.e., for every n 0. P(X = n) = P(Y = n), Remark: Notice that the theorem only asserts that X and Y have the same distribution, not that X = Y. In such cases we say that X and Y are identical in distribution and we write X d = Y. Jay Taylor (ASU) Fall / 86
51 Probability Generating Functions Proof: Because the radius of convergence of ψ X and ψ Y is greater than or equal to 1, we can perform the following differentiations: d n dt ψ n X (t) = d n t=0 dt n = =! P(X = k) t t=0 k k=0 P(X = k) d n k=0 k=n dt n tk t=0 k! P(X = k) (k n)! tk n t=0 = n! P(X = n). Similarly, d n dt ψ n Y (t) = n! P(Y = n), t=0 but since ψ X = ψ Y, the two functions have identical derivatives of all orders and so P(X = n) = P(Y = n). Jay Taylor (ASU) Fall / 86
52 Probability Generating Functions Probability generating functions of sums of independent random variables are particularly well behaved. Theorem Suppose that X 1,, X n are independent natural number-valued random variables with probability generating functions ψ X1,, ψ Xn. Then the probability generating function of X = X X n is ny ψ X (t) = ψ Xi (t) and the domain of ψ X is the intersection of the domains of the functions ψ X1, ψ Xn. i=1 Proof: Since X 1,, X n are independent, so are the random variables t X 1,, t Xn every value of t. Consequently, " i ny # ny h i ny ψ X (t) = E ht X 1+ +X n = E t X i = E t X i = ψ Xi (t) provided that t is contained in the domain of each of the functions ψ Xi. i=1 i=1 i=1 for Jay Taylor (ASU) Fall / 86
53 Probability Generating Functions Example: Let X 1,, X n be independent Bernoulli random variables, each with parameter p, and let X = X X n. Since X 1,, X n all have the same probability generating function ψ Xi (t) = 1 p + pt, it follows that the probability generating function of X is ψ X (t) = ny ψ Xi (t) = (1 p + pt) n. i=1 Since this is the same probability generating function that we found for a binomial random variable with parameters n and p, it follows that X is itself a binomial random variable with these parameters. Jay Taylor (ASU) Fall / 86
54 Probability Generating Functions Probability generating functions can also be used to calculate the mean and the variance of a random variable. Theorem Suppose that X is a random variable with probability generating function ψ X and assume that the radius of convergence of ψ X is greater than 1. Then the mean and the variance of X are equal to E[X ] = ψ X (1) Var(X ) = ψ X (1) + ψ X (1) `ψ X (1) 2. Jay Taylor (ASU) Fall / 86
55 Probability Generating Functions Proof: Provided that the radius of convergence is greater than 1, we can differentiate inside the series: ψ X (1) = = = k=0 P(X = k) d dt tk t=1 P(X = k) kt k 1 t=1 k=1 P(X = k) k k=0 = E[X ]. Jay Taylor (ASU) Fall / 86
56 Probability Generating Functions Similarly, ψ X (1) = = P(X = k) d 2 k=0 dt 2 tk t=1 P(X = k) k(k 1) k=2 = E[X 2 ] E[X ], which shows that Var(X ) = E[X 2 ] E[X ] 2 = ψ X (1) + ψ X (1) `ψ X (1) 2. Jay Taylor (ASU) Fall / 86
57 Probability Generating Functions Example: If X is a binomial random variable with parameters n and p, then the probability generating function of X is ψ X (t) = (1 p + pt) n which has derivatives ψ (1) = np and ψ (1) = n(n 1)p 2. Consequently, E[X ] = np and Var[X ] = ψ (1) + ψ (1) (ψ (1)) 2 = n(n 1)p 2 + np n 2 p 2 = np(1 p). These results agree with those that we previously calculated through more direct means. Jay Taylor (ASU) Fall / 86
58 Geometric and related distributions The Geometric Distribution Suppose that a series of independent trials is performed and that each trial has probability p of failing. If X is the number of successes that occur before the first failure, then P(X = n) = (1 p) n p for n 0. This distribution is important enough to have its own name. Definition A random variable X with values in the natural numbers is said to have the geometric distribution with parameter p, written X Geometric(p), if the probability mass function of X is P(X = n) = (1 p) n p. Jay Taylor (ASU) Fall / 86
59 Geometric and related distributions Exercise: Suppose that X Geometric(p). Find the probability generating function ψ X (t) = E[t X ] and use this to calculate the mean and the variance of X. Jay Taylor (ASU) Fall / 86
60 Geometric and related distributions Exercise: Suppose that X Geometric(p). Find the probability generating function ψ X (t) = E[t X ] and use this to calculate the mean and the variance of X. Solution: The probability generating function of X is Since ψ (t) = ψ X (t) = ψ (1) = 2(1 p)2. Therefore p 2 p(1 p) n t n p = 1 t(1 p). n=0 p(1 p) and ψ (t) = 2p(1 p)2, it follows that ψ (1) = 1 p (1 t(1 p)) 2 (1 t(1 p)) 3 p E[X ] = 1 p p Var(X ) = = 2(1 p)2 p 2 (1 p) p p p 1 p p «2 and Jay Taylor (ASU) Fall / 86
61 Geometric and related distributions The geometric distribution can be used to model random lifespans when the probability of death or failure per unit time is constant. In fact, the geometric distribution is said to be memoryless because the probability of survival over a period [t, t + s] depends only on the duration of the period and not on its starting time: P(X > t + s X > t) = P(X > t + s) P(X > t) (1 p)t+s = (1 p) t = (1 p) s = P(X > s). In other words, knowing that an individual has survived until time t makes it no more nor no less probable that they will survive for an additional s units of time. Jay Taylor (ASU) Fall / 86
62 Geometric and related distributions Example: Suppose that we believe that the lifespan of an electronic component can be modeled by a geometric distribution with an unknown parameter p and that we measure the lifespans of m copies of the component in order to estimate p. Let X 1 = x 1,, X m = x m be the observed lifespans and assume that these are independent. The likelihood function for p given the data D = (x 1,, x m) is L(p D) P p(x 1 = x 1,, X m = x m) my = P p(x i = x i ) = i=1 my p(1 p) x i i=1 = p m (1 p) x, where x = x 1 + x m. The notation P p was used above to indicate that we are calculating the probability of the data under the assumption that the parameter of the geometric distribution is p. Jay Taylor (ASU) Fall / 86
63 Geometric and related distributions The likelihood function tells us how the probability of the data varies with our choice of the parameter p. One way to select a point estimate of p is to choose the value of p that maximizes the probability of the data. This estimate is called the maximum likelihood estimate of p and can be found by maximizing the function L(p D). To this end, we differentiate L(p D) with respect to p and set the result equal to 0: 0 = d dp L(p D) = d dp (pm (1 p) x ) = mp m 1 (1 p) x xp m (1 p) x 1 = p m (1 p) x m p x «. 1 p Solving for p shows that the maximum likelihood estimate of p is ˆp ML = m m + x. Jay Taylor (ASU) Fall / 86
64 Geometric and related distributions If we use the geometric distribution to model random lifespans, then we are implicitly assuming that a single failure is sufficient to cause death or system collapse. However, many systems are robust in the sense that multiple independent components must fail for death to result. To model the lifespan of such a system we will introduce a more general class of distributions. Definition A random variable X with values in the natural numbers is said to have the negative binomial distribution with parameters r 1 and p [0, 1], written X NB(r, p), if the probability mass function of X is! P(X = n) = n + r 1 p r (1 p) n. n Remark: The negative binomial distribution with parameters r = 1 and p [0, 1] is just the geometric distribution with parameter p. Jay Taylor (ASU) Fall / 86
65 Geometric and related distributions The negative binomial distribution arises in the following way. Suppose that a sequence of independent trials is performed and that each trial has probability p of failure. If X is the number of successes that occur before the r th failure, then X NB(r, p). To verify this claim, observe that the event X = n occurs if the first n + r 1 trials result in r 1 failures and n successes, and the n + r th trial results in a failure. However, the probability of n successes in the first n r + 1 trials is given by the binomial probability! P(n successes in the first n + r 1 trials) = n + r 1 p r 1 (1 p) n. n Furthermore, since the outcome of the n + r th trial is independent of the first n + r 1 trials, it follows that!! P(X = n) = n + r 1 p r 1 (1 p) n p = n + r 1 p r (1 p) n, n n which is the negative binomial distribution. Jay Taylor (ASU) Fall / 86
66 Geometric and related distributions Suppose that X 1,, X r are independent geometric random variables with parameter p and let X = X X r. Then X NB(r, p). Indeed, if we interpret X 1 as the number of success that occur until the first failure, X 2 as the number of successes that occur until the second failure, etc., then it is clear that X is the number of successes that occur until the cumulative number of failures is equal to r. This observation makes it easy to calculate the probability generating function of X : h i ψ X (t) = E t X = ry «r p ψ Xi (t) =. 1 t(1 p) i=1 Furthermore, by differentiating ψ X (t), we can calculate the mean and the variance of X, which are E[X ] = Var(X ) = r(1 p) p r(1 p). p 2 Jay Taylor (ASU) Fall / 86
67 Poisson distribution The Poisson Distribution Suppose that we perform a large number of independent trials, say n 100, and that the probability of a success on any one trial is small, say p n = λ/n 1. If we let X (n) denote the total number of successes that occur in all n trials, then X (n) Binomial(n, p) and therefore P(X (n) = k) = = = =! n pn k (1 p n) n k k λ «k n! 1 λ «n k (n k)!k! n n ««n! λ k 1 λ «n 1 λ «k n k (n k)! k! n n «n(n 1)(n 2) (n k + 1) λ k n k k! «1 λ «n 1 λ «k. n n Jay Taylor (ASU) Fall / 86
68 Poisson distribution Notice that three of the terms in the last line depend on n and that these converge to finite limits as n : lim n «n(n 1)(n 2) (n k + 1) n k lim n lim n = 1 1 λ n «k = 1 1 λ n «n = e λ. Consequently, for every integer k 0, the probabilities P(X (n) = k) converge to a limit as n, which is lim P(X (n) = k) = e λ n λ k k! «. Jay Taylor (ASU) Fall / 86
69 Poisson distribution To verify the third limit on the preceding page, recall that the Taylor series for log(1 + x) is log(1 + x) = ( 1) n+1 x n n, n=1 which converges as long as x ( 1, 1]. In particular, when x 1, we can write log(1 + x) = x + O(x 2 ), where O(x 2 ) stands for a remainder term that is bounded by a constant times x 2. Therefore, lim log 1 λ «n = lim n n n log n = lim n n = λ. «1 λ n λn + O`n 2 «However, since e x is continuous on (, ), we can exponentiate both sides of this identity to obtain lim 1 λ «n = e λ. n n Jay Taylor (ASU) Fall / 86
70 Poisson distribution Furthermore, the limiting values of the probabilities sum to 1 when k is allowed to range over all of the natural numbers: «e λ λ k X «= e λ λ k = e λ e λ = 1. k! k! k=0 k=0 These observations motivate the following definition: Definition A random variable X is said to have the Poisson distribution with parameter λ 0 if X takes values in the non-negative integers with probability mass function In this case we write X Poisson(λ). p X (k) = P(X = k) = e λ λ k k!. Remark: The Poisson distribution takes its name from that of the 19 th century French mathematician, Siméon Dennis Poisson ( ). Jay Taylor (ASU) Fall / 86
71 Poisson distribution The probability generating function of the Poisson distribution is: h i ψ X (t) = E t X = X p X (k)t k = e λ k=0 Differentiating twice with respect to t gives = e λ e λt = e λ(t 1). k=0 (λt) k k! and we then find ψ X (t) = λe λ(t 1) ψ X (t) = λ 2 e λ(t 1) E[X ] = ψ X (1) = λ Var(X ) = ψ X (1) + ψ X (1) (ψ X (1)) 2 = λ. Thus, λ is equal to both the mean and the variance of the Poisson distribution. Jay Taylor (ASU) Fall / 86
72 Poisson distribution It has long been recognized that Poisson distributions provide a surprisingly accurate model for the statistics of a large number of seemingly unrelated phenomena. Some examples include: the number of misprints per page of a book; the number of wrong telephone numbers dialed in a day; the number of customers entering a post office per day; the number of mutations that occur when a genome is replicated; the number of α-particles discharged per day from a 14 C source; the number of major earthquakes per year; the number of Prussian soldiers killed per year by being kicked by a horse. Jay Taylor (ASU) Fall / 86
73 Poisson distribution For example, even the number of vacancies per year on the US Supreme Court is reasonably well modeled by a Poisson distribution: Number of vacancies (x) Probability Years with x vacancies Observed Expected Observed Expected Data from Cole (2010) compared with a Poisson distribution with λ = 0.5. Jay Taylor (ASU) Fall / 86
74 Poisson distribution Since these phenomena are generated by very different physical and biological processes, the fact that they share similar statistical properties cannot be explained by the specific mechanisms that operate in each instance. Instead, the widespread emergence of the Poisson distribution appears to be a consequence of the following more general mathematical result, which is commonly known as the Law of Rare Events. Theorem For each n 1, let X (n) 1,, X n (n) be a collection of independent Bernoulli random variables, each with success probability p n = λ/n, and let X (n) = X (n) X n (n) be the total number of successes in these n trials. Then «lim P(X (n) = k) = e λ λ k. n k! Interpretation: When n is large, the probability of success p n = λ/n is small and so each success is a rare event. However, since there are many trials, there is a non-negligible probability of having at least one success and the distribution of the total number of successes is approximately Poisson with parameter λ. Jay Taylor (ASU) Fall / 86
75 Poisson distribution We previously proved the law of rare events by directly calculating the limits of the probabilities P(X (n) = k) and showing that these coincide with the probabilities given by a Poisson distribution. However, we can also prove this result with the help of probability generating functions. The following theorem provides the essential tool. Theorem For each n 1, let X (n) be a non-negative integer-valued random variable with probability generating function ψ n(t) and suppose that these functions converge pointwise on the interval ( 1, 1), i.e., the limit ψ(t) lim n ψn(t) exists for all t ( 1, 1). Then ψ(t) is the probability generating function of a random variable X with values in the natural numbers and lim P X (n) = k = P(X = k) n for every integer k 0. Jay Taylor (ASU) Fall / 86
76 Poisson distribution Proof of the Law of Rare Events: Since X (n) Binomial(n, p n), we know that the probability generating function of X n is h ψ n(t) = E t X (n)i = 1 λ n + λt «n. n However, the pointwise limit of these functions as n tends to infinity is the function ψ(t) = lim 1 λ n n + λt «n = e λ(t 1), n and convergence occurs over the entire real line, i.e., for all values of t. Since ψ(t) is the probability generating function of the Poisson distribution with parameter λ, it follows that the probabilities P(X (n) = k) converge to those of this Poisson distribution. Jay Taylor (ASU) Fall / 86
77 Poisson distribution Probability generating functions can also be used to prove the following theorem. Theorem Suppose that X 1,, X n is a collection of independent Poisson distributed random variables with parameters λ 1,, λ n, respectively, and let X = X X n. Then X is Poisson-distributed with parameter λ λ n. Jay Taylor (ASU) Fall / 86
78 Poisson distribution Probability generating functions can also be used to prove the following theorem. Theorem Suppose that X 1,, X n is a collection of independent Poisson distributed random variables with parameters λ 1,, λ n, respectively, and let X = X X n. Then X is Poisson-distributed with parameter λ λ n. Proof: If ψ i (t) = e λ i (t 1) is the probability generating function of X i, then because the X i are independent, we know that the probability generating function of X is ψ X (t) = ny ψ i (t) = i=1 ny e λ i (t 1) = e (t 1) P n i=1 λ i. i=1 Since this is also the probability generating function of a Poisson-distributed random variable with parameter λ λ n, it follows that this is the distribution of X. Jay Taylor (ASU) Fall / 86
79 Fluctuation Tests Fluctuation tests and the origin of adaptive mutations One of the classic experiments of molecular genetics is the fluctuation test, which was developed by Salvador Luria and Max Delbrück in 1943 to investigate the origins of adaptive mutations. An adaptive mutation is one that increases the fitness (e.g., survival, fecundity) of an individual that carries that mutation. At the time, the molecular processes underpinning heredity and mutation were unknown (e.g., the structure of DNA was only described in 1951). There were two prevailing hypotheses explaining the origins of adaptive mutations: the spontaneous mutation hypothesis and the induced mutation hypothesis. According to the spontaneous mutation hypothesis, adaptive mutations occur by chance irrespective of the environmental conditions. According to the induced mutation hypothesis, adaptive mutations are directly induced by the environmental conditions in which they will be favored. Jay Taylor (ASU) Fall / 86
80 Fluctuation Tests Luria and Delbrück developed an experimental system based on a bacterium Escherischia coli along with a virus (T1 bacteriophage) that infects it. When T1 phage is added to a culture of E. coli, most of the bacteria are killed, but a few resistant cells may survive and give rise to resistant colonies that can be seen on the surface of a petri dish. This shows that resistance to T1 phage is a trait that varies across E. coli bacteria and which is heritable, i.e., the descendants of resistant bacteria are usually themselves resistant. Source: Wikipedia Source: Madeleine Price Ball Jay Taylor (ASU) Fall / 86
81 Fluctuation Tests The experiment carried out by Luria and Delbrück consisted of the following steps: 1 An E. coli culture was initiated from a single T1-susceptible cell and allowed to grow to a population containing millions of bacteria. 2 Several small samples were taken from this colony and spread on agar plates that had also been inoculated with the T1 phage. These plates were left for a period after which the number of resistant colonies on each plate was counted. 3 The procedures described in steps 1-2 were repeated several times, using independently established E. coli cultures and the resulting data were used to estimate both the mean and the variance of the number of resistant colonies arising in each culture. 4 If R ij is the number of resistant colonies observed on the j th plate inoculated with bacteria from the i th colony, then the mean and the variance of the number of resistant colonies can be estimated by R i = 1 5 5X R ij and V i = 1 4 j=1 5X `Rij R i 2. j=1 Jay Taylor (ASU) Fall / 86
82 Fluctuation Tests Luria and Delbrück argued that the spontaneous and induced mutation hypotheses could be distinguished in the following manner. If mutations are induced, then these will only appear after the bacteria are exposed to the phage. In this case, the number of resistant colonies will be approximately Poisson distributed and the variance will be approximately equal to the mean. If mutations are spontaneous, then the number of resistant colonies depends on timing of the mutation relative to the expansion of the culture. In this case, the variance will be much greater than the mean. Source: Wikipedia The law of rare events explains why the number of resistant colonies is expected to be Poisson distributed under the induced mutation hypothesis: although there are a large number of cells that can independently mutate to the resistance phenotype, the probability of mutation was known to be low. Jay Taylor (ASU) Fall / 86
Continuum Probability and Sets of Measure Zero
Chapter 3 Continuum Probability and Sets of Measure Zero In this chapter, we provide a motivation for using measure theory as a foundation for probability. It uses the example of random coin tossing to
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationWe are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero
Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.
More informationDiscrete Distributions
A simplest example of random experiment is a coin-tossing, formally called Bernoulli trial. It happens to be the case that many useful distributions are built upon this simplest form of experiment, whose
More informationChapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory
Chapter 1 Statistical Reasoning Why statistics? Uncertainty of nature (weather, earth movement, etc. ) Uncertainty in observation/sampling/measurement Variability of human operation/error imperfection
More informationLecture 16. Lectures 1-15 Review
18.440: Lecture 16 Lectures 1-15 Review Scott Sheffield MIT 1 Outline Counting tricks and basic principles of probability Discrete random variables 2 Outline Counting tricks and basic principles of probability
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More information1: PROBABILITY REVIEW
1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following
More information1 Random Variable: Topics
Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?
More informationMath 564 Homework 1. Solutions.
Math 564 Homework 1. Solutions. Problem 1. Prove Proposition 0.2.2. A guide to this problem: start with the open set S = (a, b), for example. First assume that a >, and show that the number a has the properties
More informationMathematical Statistics 1 Math A 6330
Mathematical Statistics 1 Math A 6330 Chapter 3 Common Families of Distributions Mohamed I. Riffi Department of Mathematics Islamic University of Gaza September 28, 2015 Outline 1 Subjects of Lecture 04
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationThe Different Sizes of Infinity
The Different Sizes of Infinity New York City College of Technology Cesar J. Rodriguez November 11, 2010 A Thought to Ponder At... Does Infinity Come in Varying Sizes? 2 Points of Marked Interest(s) General
More informationChapter 3 Discrete Random Variables
MICHIGAN STATE UNIVERSITY STT 351 SECTION 2 FALL 2008 LECTURE NOTES Chapter 3 Discrete Random Variables Nao Mimoto Contents 1 Random Variables 2 2 Probability Distributions for Discrete Variables 3 3 Expected
More informationProbability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...
Probability COMP 245 STATISTICS Dr N A Heard Contents Sample Spaces and Events. Sample Spaces........................................2 Events........................................... 2.3 Combinations
More informationAPM 504: Probability Notes. Jay Taylor Spring Jay Taylor (ASU) APM 504 Fall / 65
APM 504: Probability Notes Jay Taylor Spring 2015 Jay Taylor (ASU) APM 504 Fall 2013 1 / 65 Outline Outline 1 Probability and Uncertainty 2 Random Variables Discrete Distributions Continuous Distributions
More information18.175: Lecture 2 Extension theorems, random variables, distributions
18.175: Lecture 2 Extension theorems, random variables, distributions Scott Sheffield MIT Outline Extension theorems Characterizing measures on R d Random variables Outline Extension theorems Characterizing
More informationLecture 4: Probability and Discrete Random Variables
Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1
More information1 of 8 7/15/2009 3:43 PM Virtual Laboratories > 1. Foundations > 1 2 3 4 5 6 7 8 9 6. Cardinality Definitions and Preliminary Examples Suppose that S is a non-empty collection of sets. We define a relation
More informationPoisson processes Overview. Chapter 10
Chapter 1 Poisson processes 1.1 Overview The Binomial distribution and the geometric distribution describe the behavior of two random variables derived from the random mechanism that I have called coin
More informationThe Two Faces of Infinity Dr. Bob Gardner Great Ideas in Science (BIOL 3018)
The Two Faces of Infinity Dr. Bob Gardner Great Ideas in Science (BIOL 3018) From the webpage of Timithy Kohl, Boston University INTRODUCTION Note. We will consider infinity from two different perspectives:
More informationDiscrete Random Variables
Chapter 5 Discrete Random Variables Suppose that an experiment and a sample space are given. A random variable is a real-valued function of the outcome of the experiment. In other words, the random variable
More information37.3. The Poisson Distribution. Introduction. Prerequisites. Learning Outcomes
The Poisson Distribution 37.3 Introduction In this Section we introduce a probability model which can be used when the outcome of an experiment is a random variable taking on positive integer values and
More informationthe time it takes until a radioactive substance undergoes a decay
1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete
More informationDefinition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R
Random Variables Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R As such, a random variable summarizes the outcome of an experiment
More informationAPM 421 Probability Theory Probability Notes. Jay Taylor Fall Jay Taylor (ASU) Fall / 62
APM 421 Probability Theory Probability Notes Jay Taylor Fall 2013 Jay Taylor (ASU) Fall 2013 1 / 62 Motivation Scientific Determinism Scientific determinism holds that we can exactly predict how a system
More informationLecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya
BBM 205 Discrete Mathematics Hacettepe University http://web.cs.hacettepe.edu.tr/ bbm205 Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya Resources: Kenneth Rosen, Discrete
More informationCardinality of Sets. P. Danziger
MTH 34-76 Cardinality of Sets P Danziger Cardinal vs Ordinal Numbers If we look closely at our notions of number we will see that in fact we have two different ways of conceiving of numbers The first is
More informationTo Infinity and Beyond
University of Waterloo How do we count things? Suppose we have two bags filled with candy. In one bag we have blue candy and in the other bag we have red candy. How can we determine which bag has more
More information1.1. MEASURES AND INTEGRALS
CHAPTER 1: MEASURE THEORY In this chapter we define the notion of measure µ on a space, construct integrals on this space, and establish their basic properties under limits. The measure µ(e) will be defined
More informationTopic 3: The Expectation of a Random Variable
Topic 3: The Expectation of a Random Variable Course 003, 2017 Page 0 Expectation of a discrete random variable Definition (Expectation of a discrete r.v.): The expected value (also called the expectation
More informationMeasures and Measure Spaces
Chapter 2 Measures and Measure Spaces In summarizing the flaws of the Riemann integral we can focus on two main points: 1) Many nice functions are not Riemann integrable. 2) The Riemann integral does not
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More information2. AXIOMATIC PROBABILITY
IA Probability Lent Term 2. AXIOMATIC PROBABILITY 2. The axioms The formulation for classical probability in which all outcomes or points in the sample space are equally likely is too restrictive to develop
More informationConstruction of a general measure structure
Chapter 4 Construction of a general measure structure We turn to the development of general measure theory. The ingredients are a set describing the universe of points, a class of measurable subsets along
More informationBasic Probability. Introduction
Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with
More informationSolution. 1 Solutions of Homework 1. 2 Homework 2. Sangchul Lee. February 19, Problem 1.2
Solution Sangchul Lee February 19, 2018 1 Solutions of Homework 1 Problem 1.2 Let A and B be nonempty subsets of R + :: {x R : x > 0} which are bounded above. Let us define C = {xy : x A and y B} Show
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationFundamental Tools - Probability Theory II
Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random
More informationMAT1000 ASSIGNMENT 1. a k 3 k. x =
MAT1000 ASSIGNMENT 1 VITALY KUZNETSOV Question 1 (Exercise 2 on page 37). Tne Cantor set C can also be described in terms of ternary expansions. (a) Every number in [0, 1] has a ternary expansion x = a
More informationp. 4-1 Random Variables
Random Variables A Motivating Example Experiment: Sample k students without replacement from the population of all n students (labeled as 1, 2,, n, respectively) in our class. = {all combinations} = {{i
More information3 Continuous Random Variables
Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random
More informationSETS AND FUNCTIONS JOSHUA BALLEW
SETS AND FUNCTIONS JOSHUA BALLEW 1. Sets As a review, we begin by considering a naive look at set theory. For our purposes, we define a set as a collection of objects. Except for certain sets like N, Z,
More informationMathematics 220 Workshop Cardinality. Some harder problems on cardinality.
Some harder problems on cardinality. These are two series of problems with specific goals: the first goal is to prove that the cardinality of the set of irrational numbers is continuum, and the second
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationIntroduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION
Introduction to Proofs in Analysis updated December 5, 2016 By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION Purpose. These notes intend to introduce four main notions from
More informationLectures on Elementary Probability. William G. Faris
Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationLecture 3. Discrete Random Variables
Math 408 - Mathematical Statistics Lecture 3. Discrete Random Variables January 23, 2013 Konstantin Zuev (USC) Math 408, Lecture 3 January 23, 2013 1 / 14 Agenda Random Variable: Motivation and Definition
More information1 INFO Sep 05
Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually
More informationMeasure and integration
Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.
More informationADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS
ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS 1. Cardinal number of a set The cardinal number (or simply cardinal) of a set is a generalization of the concept of the number of elements
More informationIn N we can do addition, but in order to do subtraction we need to extend N to the integers
Chapter The Real Numbers.. Some Preliminaries Discussion: The Irrationality of 2. We begin with the natural numbers N = {, 2, 3, }. In N we can do addition, but in order to do subtraction we need to extend
More informationTom Salisbury
MATH 2030 3.00MW Elementary Probability Course Notes Part V: Independence of Random Variables, Law of Large Numbers, Central Limit Theorem, Poisson distribution Geometric & Exponential distributions Tom
More informationAxiomatic set theory. Chapter Why axiomatic set theory?
Chapter 1 Axiomatic set theory 1.1 Why axiomatic set theory? Essentially all mathematical theories deal with sets in one way or another. In most cases, however, the use of set theory is limited to its
More informationProblem Set 2: Solutions Math 201A: Fall 2016
Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that
More informationSuppose that you have three coins. Coin A is fair, coin B shows heads with probability 0.6 and coin C shows heads with probability 0.8.
Suppose that you have three coins. Coin A is fair, coin B shows heads with probability 0.6 and coin C shows heads with probability 0.8. Coin A is flipped until a head appears, then coin B is flipped until
More informationModule 1. Probability
Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive
More informationContinuous Probability Spaces
Continuous Probability Spaces Ω is not countable. Outcomes can be any real number or part of an interval of R, e.g. heights, weights and lifetimes. Can not assign probabilities to each outcome and add
More informationCITS2211 Discrete Structures (2017) Cardinality and Countability
CITS2211 Discrete Structures (2017) Cardinality and Countability Highlights What is cardinality? Is it the same as size? Types of cardinality and infinite sets Reading Sections 45 and 81 84 of Mathematics
More informationIntroduction to Probability
Introduction to Probability Salvatore Pace September 2, 208 Introduction In a frequentist interpretation of probability, a probability measure P (A) says that if I do something N times, I should see event
More informationCS 125 Section #10 (Un)decidability and Probability November 1, 2016
CS 125 Section #10 (Un)decidability and Probability November 1, 2016 1 Countability Recall that a set S is countable (either finite or countably infinite) if and only if there exists a surjective mapping
More informationLecture 6: The Pigeonhole Principle and Probability Spaces
Lecture 6: The Pigeonhole Principle and Probability Spaces Anup Rao January 17, 2018 We discuss the pigeonhole principle and probability spaces. Pigeonhole Principle The pigeonhole principle is an extremely
More informationProbability Review. Gonzalo Mateos
Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 11, 2018 Introduction
More informationn if n is even. f (n)=
6 2. PROBABILITY 4. Countable and uncountable Definition 32. An set Ω is said to be finite if there is an n N and a bijection from Ω onto [n]. An infinite set Ω is said to be countable if there is a bijection
More informationNotes on ordinals and cardinals
Notes on ordinals and cardinals Reed Solomon 1 Background Terminology We will use the following notation for the common number systems: N = {0, 1, 2,...} = the natural numbers Z = {..., 2, 1, 0, 1, 2,...}
More informationMath Bootcamp 2012 Miscellaneous
Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.
More informationWe will briefly look at the definition of a probability space, probability measures, conditional probability and independence of probability events.
1 Probability 1.1 Probability spaces We will briefly look at the definition of a probability space, probability measures, conditional probability and independence of probability events. Definition 1.1.
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationPOL502: Foundations. Kosuke Imai Department of Politics, Princeton University. October 10, 2005
POL502: Foundations Kosuke Imai Department of Politics, Princeton University October 10, 2005 Our first task is to develop the foundations that are necessary for the materials covered in this course. 1
More informationBINOMIAL DISTRIBUTION
BINOMIAL DISTRIBUTION The binomial distribution is a particular type of discrete pmf. It describes random variables which satisfy the following conditions: 1 You perform n identical experiments (called
More informationCountability. 1 Motivation. 2 Counting
Countability 1 Motivation In topology as well as other areas of mathematics, we deal with a lot of infinite sets. However, as we will gradually discover, some infinite sets are bigger than others. Countably
More informationPoisson approximations
Chapter 9 Poisson approximations 9.1 Overview The Binn, p) can be thought of as the distribution of a sum of independent indicator random variables X 1 + + X n, with {X i = 1} denoting a head on the ith
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 27
CS 70 Discrete Mathematics for CS Spring 007 Luca Trevisan Lecture 7 Infinity and Countability Consider a function f that maps elements of a set A (called the domain of f ) to elements of set B (called
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationRelationship between probability set function and random variable - 2 -
2.0 Random Variables A rat is selected at random from a cage and its sex is determined. The set of possible outcomes is female and male. Thus outcome space is S = {female, male} = {F, M}. If we let X be
More informationDiscrete Distributions
Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have
More informationCHAPTER 8: EXPLORING R
CHAPTER 8: EXPLORING R LECTURE NOTES FOR MATH 378 (CSUSM, SPRING 2009). WAYNE AITKEN In the previous chapter we discussed the need for a complete ordered field. The field Q is not complete, so we constructed
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationIn N we can do addition, but in order to do subtraction we need to extend N to the integers
Chapter 1 The Real Numbers 1.1. Some Preliminaries Discussion: The Irrationality of 2. We begin with the natural numbers N = {1, 2, 3, }. In N we can do addition, but in order to do subtraction we need
More informationLecture 2: Probability and Distributions
Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info
More informationMAT 271E Probability and Statistics
MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday
More informationECS 120 Lesson 18 Decidable Problems, the Halting Problem
ECS 120 Lesson 18 Decidable Problems, the Halting Problem Oliver Kreylos Friday, May 11th, 2001 In the last lecture, we had a look at a problem that we claimed was not solvable by an algorithm the problem
More informationMORE ON CONTINUOUS FUNCTIONS AND SETS
Chapter 6 MORE ON CONTINUOUS FUNCTIONS AND SETS This chapter can be considered enrichment material containing also several more advanced topics and may be skipped in its entirety. You can proceed directly
More informationA Readable Introduction to Real Mathematics
Solutions to selected problems in the book A Readable Introduction to Real Mathematics D. Rosenthal, D. Rosenthal, P. Rosenthal Chapter 10: Sizes of Infinite Sets 1. Show that the set of all polynomials
More informationThings to remember when learning probability distributions:
SPECIAL DISTRIBUTIONS Some distributions are special because they are useful They include: Poisson, exponential, Normal (Gaussian), Gamma, geometric, negative binomial, Binomial and hypergeometric distributions
More informationAlgorithms: Lecture 2
1 Algorithms: Lecture 2 Basic Structures: Sets, Functions, Sequences, and Sums Jinwoo Kim jwkim@jjay.cuny.edu 2.1 Sets 2 1 2.1 Sets 3 2.1 Sets 4 2 2.1 Sets 5 2.1 Sets 6 3 2.1 Sets 7 2.2 Set Operations
More informationProbability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?
Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical
More informationX n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)
14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence
More informationRANDOM WALKS AND THE PROBABILITY OF RETURNING HOME
RANDOM WALKS AND THE PROBABILITY OF RETURNING HOME ELIZABETH G. OMBRELLARO Abstract. This paper is expository in nature. It intuitively explains, using a geometrical and measure theory perspective, why
More informationSlides 8: Statistical Models in Simulation
Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An
More informationNotes Week 2 Chapter 3 Probability WEEK 2 page 1
Notes Week 2 Chapter 3 Probability WEEK 2 page 1 The sample space of an experiment, sometimes denoted S or in probability theory, is the set that consists of all possible elementary outcomes of that experiment
More informationAPM 541: Stochastic Modelling in Biology Probability Notes. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 77
APM 541: Stochastic Modelling in Biology Probability Notes Jay Taylor Fall 2013 Jay Taylor (ASU) APM 541 Fall 2013 1 / 77 Outline Outline 1 Motivation 2 Probability and Uncertainty 3 Conditional Probability
More informationLebesgue Measure on R n
CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationCME 106: Review Probability theory
: Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:
More information