Discrete random variables and probability distributions random variable is a mapping from the sample space to real numbers. notation: X, Y, Z,... Example: Ask a student whether she/he works part time or not. S= {Yes, No}. X=1 if Yes, X=0 if No. A random variable whose only possible values are 0 and 1 is called a Bernoulli random variable. Y = number of car accidents in a week. Flip a coin three times. Let Z=number of heads in three flips. {TTT } Z = 0 {HTT, THT, TTH} Z = 1 {HHT, HTH, THH} Z = 2 {HHH} Z = 3. W = the weight of a randomly selected athlete.
discrete random variable: takes finite or countably infinite values. continuous random variable: takes all values in an interval or union of intervals.
probability distribution or probability mass function (pmf) of a discrete rv is defined by p(x) = P(X = x) = P(all s S: X (s) = x). Note p(x) 0 and x p(x) = 1. Example: Suppose 20% of JMU students work part time. X = 1 if yes, X = 0 if no. p(0) = P(X = 0) = P(no) = 0.8 p(1) = P(X = 1) = P(yes) = 0.2. Example: Flip a fair coin 3 times. X = number of heads in 3 flips. p(0) = P(X = 0) = P{TTT } = 1/8, p(1) = P(X = 1) = P{HTT, THT, TTH} = 3/8, p(2) = P(X = 2) = P{HHT, HTH, THH, } = 3/8, p(3) = P(X = 3) = P{HHH} = 1/8.
parameter of a distribution Example : The probability of getting a head in each flip in p, then p(0) = P(X = 0) = P{TTT } = (1 p) 3, p(1) = P(X = 1) = P{HTT, THT, TTH} = 3 p (1 p) 2, p(2) = P(X = 2) = P{HHT, HTH, THH, } = 3 p 2 (1 p), p(3) = P(X = 3) = P{HHH} = p 3. p is a parameter of this distribution.
CDF The cumulative distribution function (cdf) F (x) of a discrete rv X with pmf p(x) is defined for any number x by F (x) = P(X x) = y:y x p(y). Example 1: The pmf of a rv X is given below: x 1 2 3 4 p(x) 0.4 0.3 0.2 0.1 Note that F (1) = P(X 1) = P(X = 1) = 0.4 F (2) = P(X 2) = P(X = 1or2) = 0.4 + 0.3 = 0.7 F (3) = P(X 3) = P(X = 1or2or3) = 0.4 + 0.3 + 0.2 = 0.9 F (4) = P(X 4) = P(X = 1or2or3or4) = 0.4 + 0.3 + 0.2 + 0.1 = 1.0
The cdf of X is plot of F (x): F (x) = = 0, x < 1 = 0.4, 1 x < 2 = 0.7, 2 x < 3 = 0.9, 3 x < 4 = 1.0, x 4
Expected values The expected value or the mean of a discrete rv X is E(X ) = µ x = x xp(x). For example 1, E(X ) = 1 0.4 + 2 0.3 + 3 0.2 + 4 0.1 = 2.0
Expected value of a function of X, h(x ): E(h(X )) = x h(x)p(x). Example 1 continued: Let h(x) = 1 x. E(h(X )) = 1 0.4 + 1 2 0.3 + 1 3 0.2 + 1 4 0.1 = 0.641. Proposition: E(aX + b) = ae(x ) + b. Proof: E(aX + b) = x (ax + b)p(x) = a x xp(x) + b x p(x) = ae(x ) + b. Two special cases of the proposition: For any constants a and b, E(aX ) = ae(x ), E(X + b) = E(X ) + b.
Variance of X Let X have pmf p(x) with mean µ. The variance of X is V (X ) = σx 2 = x (x µ)2 p(x) = E(X µ) 2 = E(X 2 ) µ 2. The standard deviation of X is σ X = σx 2. Example 1: continued. E(X 2 ) = 1 2 0.4 + 2 2 0.3 + 3 2 0.2 + 4 2 0.1 = 5 V (X ) = E(X 2 ) µ 2 = 5 2 2 = 1. Rules of variance: V (h(x )) = x [h(x) E(h(x))]2 p(x). V (ax + b) = V (ax ) = a 2 V (X ). proof: V (ax ) = E(aX aµ) 2 = Ea 2 (X µ) 2 = a 2 E(X µ) 2 = a 2 V (X ).
Moments and moment generating functions moments: expected values of X r or (X µ) r, where r is an integer. Moments (about 0): E(X r ). e.g., EX : first moment Moments about mean: E(X µ) r.e.g.,e(x µ) 2 = V (X ). The moment generating function (mgf) of a discrete rv is defined to be M X (t) = E(e tx ) = x etx p(x). We say the mgf exists if it is defined for t ( t 0, t 0 ) where t 0 > 0.
compute mgf Example: Let X be a Bernoulli rv with p(0)=1/3 and p(1)=2/3, then M X (t) = E(e tx ) = x etx p(x) = e t 0 p(0) + e t 1 p(1) = 1 3 + et 2 3. proposition: If the mgf exists and is the same for two distributions, then the two distributions are the same. Theorem: If the mgf exists, E(X r ) = M (r) X (0), where M(r) X (t) is the rth derivative of M X (t). Example: M X (t) = 0.4e t + 0.3e 2t + 0.2e 3t + 0.1e 4t. M X (t) = 0.4et + 0.6e 2t + 0.6e 3t + 0.4e 4t, and M X (0) = 0.4 + 0.6 + 0.6 + 0.4 = 2 = E(X ). M X (t) = 0.4et + 1.2e 2t + 1.8e 3t + 1.6e 4t. M X (0) = 0.4 + 1.2 + 1.8 + 1.6 = 5 = E(X 2 ). so V (X ) = E(X 2 ) µ 2 = 5 2 2 = 1.
The Binomial distribution Binomial experiment: 1. The experiment consists of n trials. 2. Each trial has two possible outcomes, success or failure. 3. The trials are independent. 4. The probability of success, p, is constant from trial to trial. Binomial random variable X is defined to be the number of successes out of n trials. Note the possible vales X can take are 0, 1, 2,...n.
Find binomial probabilities The multiple problem has 4 possible answers and only one of them is correct. If a student randomly guesses the answers for three problems, what is the probability that he can get two problems right? n = 3, p = 0.25. The three ways he can get two problems right are CCI, CIC, ICC. Each way has a probability 0.25 0.25 0.75 = 0.047. The probability of the three ways is: 3 0.25 0.25 0.75 = 0.14. Suppose he has to answer 5 problems. What is the probability that he can get 3 problems right? One particular way he can get 3 right is CCCII, its probability is 0.25 3 0.75 2 = 0.0088. There are 10 ways he can get 3 right. The probability of these ten ways is 10 0.25 3 0.75 2 = 0.09.
The formula for binomial distribution P(X = x) = n! x!(n x)! px (1 p) n x, x = 0, 1, 2,...n. EX = xp(x) = np, V (X ) = np(1 p). It is known that the 80% of patients who receive heart transplant surgery will survive. A hospital just performed heart transplant surgeries on 6 patients. Find the probability that a). All of them will survive. b). Exactly 4 of them will survive.
p(x = 6) = 0.8 6 = 0.262. p(x = 4) = 6! 4!2! 0.84 0.2 2 = 0.246.
Binomial theorem: (a + b) n = n ( n x=0 x) a x b n x. mgf of X : M X (t) = x etx p(x) = n x=0 etx( n x) p x (1 p) n x = n ( n ) x=0 x (pe t ) x (1 p) n x = (pe t + 1 p) n. Use mgf to verify that EX = np, V (X ) = np(1 p).
Geometric distribution In Bernoulli repeated trials, some times we are interested in the X =number of trials needed to have the first success. A random variable is said to have a geometric distribution if its pmf is given by p(x; p) = p(1 p) x 1, x = 1, 2, 3,. E(X ) = 1 1 p p, V (X ) =. p 2 Example: A family decides to have children until it has a son. What the probability that the family will have 4 children if P(B) = 0.51, P(G) = 0.49? P(X = 4) = P(GGGB) = 0.49 3 0.51 = 0.06.
Hypergeometric distribution The population consists of N individuals/elements. Each element is either a Success (S) or Failure (F). There are M successes in the population. A random sample of n elements is selected without replacement. X = number of S s in the sample. e.g. A class consists of 10 female and 15 male students. A committee made of 4 students is to be randomly chosen from the class. X = number of females in the committee. x = 0, 1, 2, 3, 4. P(X = 0) = (15 4 ) ( 25 4 ) = 0.1079. P(X = 2) = (10 2 )( 15 2 ) = 0.3735. ( 25 4 ) In general, P(X = x) = (M x )( N M n x ), max(0, n N + M) x min(n, M). ( N n) E(X ) = n M N n N, V (X ) = N 1 n M N (1 M N ).
The Poisson distribution A random variable is said to have a Poisson distribution with parameter λ(λ > 0) if the pmf of X is p(x; λ) = e λ λ x x!, x = 0, 1, 2, 3,. The mean and variance of a Poisson distribution: E(X ) = λ, V (X ) = λ. proposition: The binomial pmf approaches the Poisson pmf if we let n, p 0 in such a way that np λ > 0.
Example: Let X be the number of creatures of a certain type captured in a trap during a day. Suppose X has a Poisson distribution with λ = 4.5. i.e., on average the trap will capture 4.5 creatures a day. The probability that a trap capture 5 creatures in a day is P(X = 5) = e 4.5 (4.5) 5 5! = 0.1708. The probability that the trap will capture 0 creatures is P(X = 0) = e 4.5 (4.5) 0 0! = e 4.5 = 0.011. The probability that the trap will capture 12 creatures is P(X = 20) = e 4.5 (4.5) 12 12! = 0.0016.
Approximate binomial probability If 2 percent of the books bound at a certain factory have defective bindings, find the probability the five of 400 books bound by this factory will have defective bindings. Poisson approximation: λ = 400 0.2 = 8, P(X = 5) = e 8 8 5 5! = 0.093. Binomial probability: n=400, p=0.02, P(X = 5) = 400! 5!395! 0.025 (0.98) 395 = 0.091.
1. Let X be a discrete rv with P(X = 2) = P(X = 2) = p 2, and P(X = 0) = 1 p, where p is a number between 0 and 1. Find E(X ), V (X ) and E(log(X + 3)). E(log(X + 3)) = log(x + 3)p(x) = log( 2+3) p 2 +log(2+3) p 2 +log(0+3) (1 p) = p 2 log 5+(1 p) log 3. 2. An automobile safety engineer claims that 1 in 10 automobile accidents is due to driver fatigue. If this is true, what is the probability that at least 3 of 5 automobile accidents are due to driver fatigue? Let X be the number of auto accidents due to driver fatigue among 5 auto accidents, then X has a binomial distribution with n = 5, p = 0.1. P(X 3) = P(X = 3) + P(X = 4) + P(X = 5) = 0.00856.
Exercise Customers at a gas station pay with a credit card (A), debit card (B), or cash (C). Assume that successive customers make independent choices with P(A) = 0.5, P(B) = 0.2 and P(C) = 0.3. 1). Among the next 10 customers what is the probability that exactly 4 will pay with a debit card? 2). Among the next 100 customers, what are the mean and variance of the number who pay with a debit card?
Exercise Suppose small aircraft arrive at an airport according to a Poisson process with rate 8 per hour, so that the number of arrivals during a time period of t hours is a Poisson rv with λ = 8t. 1). What is the probability that exactly 6 small aircraft will arrive during a 1 hour period? 2). What are the expected value of the number of small aircraft that arrive during a 90 min period?
Exercise In testing of circuit boards, the probability that any particular diode will fail is 0.01. A circuit board contains 200 diodes. 1). How many diodes would you expect to fail on a circuit board? 2). What is the probability that zero diodes will fail on a randomly selected board? 3). If five boards are shipped to a customer, what is the probability that at least 4 of them work properly? ( A board works only if all its diodes work.)