Probability Distributions for Discrete RV

An example: Assume we toss a coin 3 times and record the outcomes. Let X i be a random variable defined by { 1, if the i th outcome is Head; X i = 0, if the i th outcome is Tail; Let X be the random variable such that X = X 1 + X 2 + X 3, then X represents the total number of Heads we could get from the experiment. If the probability for getting a Head for each toss is 0.7, then the probabilities for all the outcomes are tabulated as following: s HHH HHT HTH HTT THH THT TTH TTT x 3 2 2 1 2 1 1 0 p(x) 0.343 0.147 0.147 0.063 0.147 0.063 0.063 0.027 Liang Zhang (UofU) Applied Statistics I June 17, 2008 1 / 17

Example continued: s HHH HHT HTH HTT THH THT TTH TTT x 3 2 2 1 2 1 1 0 p(x) 0.343 0.147 0.147 0.063 0.147 0.063 0.063 0.027 We can re-tabulate it only for the x values: x 0 1 2 3 p(x) 0.027 0.189 0.441 0.343 Now we can answer various questions. The probability that there are at most 2 Heads is P(X 2) = P(x = 0 or 1 or 2) = p(0) + p(1) + p(2) = 0.657 The probability that the number of Heads are is strictly between 1 and 3 is P(1 < X < 3) = P(X = 2) = p(2) = 0.441 Liang Zhang (UofU) Applied Statistics I June 17, 2008 2 / 17

Definition The probability distribution or probability mass function (pmf) of a discrete rv is defined for every number x by p(x) = P(X = x) = P(all s S : X (s) = x). In words, for every possible value x of the random variable, the pmf specifies the probability of observing that value when the experiment is performed. (The conditions p(x) 0 and all possible x p(x) = 1 are required for any pmf.) Liang Zhang (UofU) Applied Statistics I June 17, 2008 3 / 17

Example 3.8 Six lots of components are ready to be shipped by a certain supplier. The number of defective components in each lot is as follows: Lot 1 2 3 4 5 6 Number of defectives 0 2 0 1 2 0 One of these lots is to be randomly selected for shipment to a particular customer. Let X be the number of defectives in the selected lot. The three possible X values are 0, 1 and 2. The pmf for X is p(0) = P(X = 0) = P(lot 1 or 3 or 6 is selected) = 3 6 = 0.500 p(1) = P(X = 1) = P(lot 4 is selected) = 1 6 = 0.167 p(2) = P(X = 2) = P(lot 2 or 5 is selected) = 2 6 = 0.333 Liang Zhang (UofU) Applied Statistics I June 17, 2008 4 / 17

Example 3.10: Consider a group of five potential blood donors a, b, c, d, and e of whom only a and b have type O+ blood. Five blood smaples, one from each individual, will be typed in random order until an O+ individual is identified. Let the rv Y = the number of typings necessary to identify an O+ individual. Then what is the pmf of Y? Liang Zhang (UofU) Applied Statistics I June 17, 2008 5 / 17

Example: Consider whether the next customer coming to a certain gas station buys gasoline or diesel. Let { 1, if the customer purchases gasoline X = 0, if the customer purchases diesel If 30% of all customers in one month purchase diesel, then the pmf for X is p(0) = P(X = 0) = P(nextcustomerbuysdiesel) = 0.3 p(1) = P(X = 1) = P(nextcustomerbuysgasoline) = 0.7 p(x) = P(X = x) = 0 for x 0 or 1 Liang Zhang (UofU) Applied Statistics I June 17, 2008 6 / 17

Example: Consider whether the next customer coming to a certain gas station buys gasoline or diesel. Let { 1, if the customer purchases gasoline X = 0, if the customer purchases diesel If 100α% of all customers in one month purchase diesel, then the pmf for X is p(0) = P(X = 0) = P(nextcustomerbuysdiesel) = α p(1) = P(X = 1) = P(nextcustomerbuysgasoline) = 1 α p(x) = P(X = x) = 0 for x 0 or 1 here α is between 0 and 1. Liang Zhang (UofU) Applied Statistics I June 17, 2008 7 / 17

Definition Suppose p(x) depends on a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution. Such a quantity is called a parameter of the distribution. The collection of all probability distributions for different values of the parameter is called a family of probability distribution. For the previous example, the quantity α is a parameter. Each different value of α between 0 and 1 determines a different member of a family of distributions; two such members are 0.3 if x = 0 p(x) = 0.7 if x = 1 0 otherwise 0.25 if x = 0 p(x) = 0.75 if x = 1 0 otherwise Liang Zhang (UofU) Applied Statistics I June 17, 2008 8 / 17

Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a. Let p = P({ }), i.e. there are 100 p s. Assume the successive drawings are independent and define X = the number of drawings. Then p(1) = P(X = 1) = P({ }) = p p(2) = P(X = 2) = P({ }) = (1 p) p p(3) = P(X = 3) = P({ }) = (1 p) (1 p) p... A general formula would be { (1 p) x 1 p x = 1, 2, 3,... p(x) = 0 otherwise Liang Zhang (UofU) Applied Statistics I June 17, 2008 9 / 17

Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a. Let p = P({ }), i.e. there are 100 p s. Assume the successive drawings are independent and define X = the number of drawings. If we know that there are 20 s, i.e. p = 0.2, then what is the probability for us to draw at most 3 times? More than 2 times? P(X 3) = p(1) + p(2) + p(3) = 0.2 + 0.2 0.8 + 0.2 (0.8) 2 = 0.488 P(X > 2) = p(3)+p(4)+p(5)+ = 1 p(1) p(2) = 1 0.2 0.2 0.8 = 0.64 Liang Zhang (UofU) Applied Statistics I June 17, 2008 10 / 17

Definition The cumulative distribution function (cdf) F (x) of a discrete rv X with pmf p(x) is defined for every number x by F (x) = P(X x) = y:y x p(y) For any number x, F(x) is the probability that the observed value of X will be at most x. F (x) = P(X x) = P(X is less than or equal to x) p(x) = P(X = x) = P(X is exactly equal to x) Liang Zhang (UofU) Applied Statistics I June 17, 2008 11 / 17

Example 3.10 (continued): 0 if y < 1 0.4 if 1 y < 2 F (y) = 0.7 if 2 y < 3 0.9 if 3 y < 4 1 if y 2 Liang Zhang (UofU) Applied Statistics I June 17, 2008 12 / 17

Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a. Let α = P({ }), i.e. there are 100 α s. Assume the successive drawings are independent and define X = the number of drawings. The pmf would be { (1 α) x 1 α x = 1, 2, 3,... p(x) = 0 otherwise Then for any positive interger x, we have F (x) = y x p(y) = = x x 1 (1 α) (y 1) α = α (1 α) y y=1 { 1 (1 α) x x 1 0 x < 1 y=0 Liang Zhang (UofU) Applied Statistics I June 17, 2008 13 / 17

pmf = cdf: It is also possible cdf = pmf: F (x) = P(X x) = y:y x p(x) = F (x) F (x ) p(y) where x represents the largest possible X value that is strictly less than x. Liang Zhang (UofU) Applied Statistics I June 17, 2008 15 / 17

Proposition For any two numbers a and b with a b, P(a X b) = F (b) F (a ) where a represents the largest possible X value that is strictly less than a. In particular, if the only possible values are integers and if a and b are integers, then P(a X b) = P(X = a or a + 1 or... or b) = F (b) F (a 1) Taking a = b yields P(X = a) = F (a) F (a 1) in this case. Liang Zhang (UofU) Applied Statistics I June 17, 2008 16 / 17

Example (Problem 23): A consumer organization that evaluates new automobiles customarily reports the number of major defects in each car examined. Let X denote the number of major defects in a randomly selected car of a certain type. The cdf of X is as follows: 0 x < 0 0.06 0 x < 1 0.19 1 x < 2 0.39 2 x < 3 F (x) = 0.67 3 x < 4 0.92 4 x < 5 0.97 5 x < 6 1 x 6 Calculate the following probabilities directly from the cdf: (a)p(2), (b)p(x > 3) and (c)p(2 X < 5). Liang Zhang (UofU) Applied Statistics I June 17, 2008 17 / 17