BSTT523: Pagano & Gavreau, Chapter 7 1 Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic Random Variable (R.V.) X Assumes values (x) by chance Discrete R.V. Can assume a finite number of values Continuous R.V. Can assume any value within an interval 1. Discrete Random Variables p.2 2. Some Discrete Distributions Bernoulli distribution p.5 Binomial distribution p.7 Poisson distribution p.13 3. Continuous Random Variables p.16 4. The Normal/Gaussian Distribution p.18
BSTT523: Pagano & Gavreau, Chapter 7 2 1. Discrete Random Variables: Definition: The probability distribution of a discrete r.v. X: A table, graph, formula, or other device that specifies all possible values of X and their respective probabilities Example 7.1 Table form: Birth order of children in the U.S. x: Birth Order P(X = x) 1 0.416 2 0.330 3 0.158 4 0.058 5 0.021 6 0.009 7 0.004 8+ 0.004 Total 1.000
BSTT523: Pagano & Gavreau, Chapter 7 3 Discrete Probability Density Function (Discrete PDF): f(x) = P(X = x) Properties of the discrete PDF: i. 0 f(x i ) 1 for all x i Non-negative ii. f(x i ) {all x i } = 1 Exhaustive iii. P X = x i X = x j = f(x i ) + f(x j ) Additive Cumulative Distribution Function (CDF): F(x) = P(X x) = x i x P(X = x i ) = x i x f(x i ) Note: i. PDF f(x i ) = P(X = x i ) = F(x i ) F(x i 1 ) ii. P(X < x i ) = F(x i 1 ) iii. P(a < X b) = F(b) F(a)
BSTT523: Pagano & Gavreau, Chapter 7 4 Example 7.1: Birth Order x: Birth Order PDF f(x) = P(X = x) 1 0.416 0.416 2 0.330 0.746 3 0.158 0.904 4 0.058 0.962 5 0.021 0.983 6 0.009 0.992 7 0.004 0.996 8+ 0.004 1.000 CDF F(x) = P(X x) Q1. Prob. that a child picked at random was mother s 1 st or 2 nd child? Q2. Prob. that a child picked at random was of birth order fewer than 4? Q3. Prob. that a child picked at random was of order 5 or more? Q4. Prob. that a child picked at random was of order between 3 and 5?
BSTT523: Pagano & Gavreau, Chapter 7 5 2. Some Discrete Distributions Bernoulli Distribution Bernoulli Variable: Binary Variable 1, success X = 0, failure Bernoulli Trial: One performance of experiment with 0/1 outcome Denote p = P(X = 1) q = P(X = 0) = 1 p The PDF of the Bernoulli distribution is p if X = 1 f(x) = q if X = 0 = p x q 1 x, x = 0,1 = p x (1 p) 1 x, x = 0,1 The Bernoulli distribution has one parameter = p
BSTT523: Pagano & Gavreau, Chapter 7 6 If X follows a Bernoulli distribution, then Mean: Variance: μ = E(X) = p σ 2 = Var(X) = pq = p(1 p) Examples of Bernoulli variables: 1, Heads Ex. 1: flip a coin X = 0, Tails Ex. 2: roll a die, interested in 3 s X = 1, die falls on 3 0, otherwise
BSTT523: Pagano & Gavreau, Chapter 7 7 Binomial Distribution Perform n independent Bernoulli trials. X = number of successes (1 s) p = probability of success in each trial X~BIN(n, p) q = 1 p Q: What is the PDF f(x), x = 0,1,, n of X~BIN(n, p)? i.e., what is the probability of x successes in n Bernoulli trials? Q1. 5 Bernoulli trials, X~BIN(5, p) P(result is 10010)=? Solution: pqqpq = p 2 q 3 Q2. Other results with 2 successes out of 5? Number Sequence 1 11000 2 10100 3 10010 4 10001 5 01100 6 01010 7 01001 8 00110 9 00101 10 00011 There are 10 ways to get 2 successes out of 5 The probability of each sequence is p 2 q 3 P(Sequence 1 or 2 or or 10) = 10p 2 q 3
BSTT523: Pagano & Gavreau, Chapter 7 8 Definition: A Combination of n subjects taken x at a time = Number of unordered subsets of x ( n choose x ) = ncx = n! x!(n x)! where x! = x(x-1)(x-2) (2)(1) and define 0!=1 Example: 5 choose 2 how many subsets of 2 out of 5? 5C2 = 5! = 5 4 = 10 2!(5 2)! 2 1 Back to binomial distribution question: X~BIN(5, p); f(2)=? f(5)=? Ans: f(2) = 5C2p 2 q 3 = 10p 2 q 3 f(5) = 5C5p 5 q 0 = 5! 5!0! p5 q 0 = 1p 5 1= p 5
BSTT523: Pagano & Gavreau, Chapter 7 9 Binomial PDF f(x) : X~BIN(n, p) P(x successes in n Bernoulli trials) f(x) = ncx p x q n-x, x = 0, 1,, n = 0, otherwise Number of Successes x Probability f(x) 0 nc0 q n 1 nc1 pq n-1...... x ncx p x q n-x...... n-1 ncn-1 p n-1 q n ncn p n Total 1 Important Binomial distribution features: Mean: Variance: μ = E(X) = np σ 2 = Var(X) = npq
BSTT523: Pagano & Gavreau, Chapter 7 10 Example 7.2 Smoking in the U.S.: 29% are smokers, or p =.29 Select a random sample of size 10. Q1. What is P(4 smokers in the sample)? X = number of smokers out of n = 10 X~BIN(10,.29) Solution 1. f(4) = 10C4 (0.29) 4 (0.71) 6 = 10! (.00707)(.1281) =.1903 4!6! Solution 2. Table A.1 (P.A1): Binomial PDF p = 0.05 to 0.5, n = 2 to 20 f(4) : n = 10, p.30 f(4).2001 Solution 3. SAS: PROBBNML(p, n, m) CDF PDF( BINOMIAL, x, p, n) PDF CDF( BINOMIAL, x, p, n) - CDF Q2. P(6 or more smokers in the sample)=? P(X 6) = 1 F(5) = 1 (. 9596) =.0404
BSTT523: Pagano & Gavreau, Chapter 7 11 Q3. Among the 10 individuals chosen, what is the expected number of smokers? E(X) = np = 10 29 = 2.9 Variance and SD: Var(X) = npq = 10 (. 29) (. 71) = 2.059 SD = npq = 2.059 = 1.43 Note: Using Table A.1, what if p>0.5? f(x, n, p) = ncx p x (1-p) n-x f(n x, n, 1 p) = ncn-x (1-p) n-x (p) x ncx= n! x!(n x)! = n! (n x)!x! = ncn-x f(x, n, p) = f(n x, n, 1 p) i.e. if p>0.5 then treat X C as success. P(X x), X~BIN(n, p) = P(X C n x), X C ~BIN(n, 1 p)
BSTT523: Pagano & Gavreau, Chapter 7 12 Example 7.3 What do you think about the problem of childhood obesity? Poll in 2003: 55% of residents think it is serious. Randomly select n=12 residents. Q1. P(8 people think it is serious )? X~BIN(12,.55) f(8) =.1700 Same as P(4 out of 12 do not think serious ); X~BIN(12,.45) f(4) =.1700 Q2. P(5 or fewer think serious ) =? P(X 5 n = 12, p =.55) = P(X 7 n = 12, p =.45) = 1 P(X 6 n = 12, p =.45) = 1.7393 =.2607 Q3. Among the sample of 12, what is the expected number of people who think childhood obesity is serious? E(X) = np = 12.55 = 6.6 Q4. What is the variance of the number who think childhood obesity is serious? Var(X) = npq = 12 (. 55) (. 45) = 2.97
BSTT523: Pagano & Gavreau, Chapter 7 13 Poisson Distribution X = number of event occurrences in a given interval of time/space/volume etc. i.e. Count Data Probability that x events will occur: f(x) = e λ λ x x!, x=0, 1, 2,... X~POI(λ) Important Poisson features: Mean: E(X) = λ Variance: Var(X) = λ When λ is small, the distribution is right-skewed; when λ increases (λ 10), the distribution becomes symmetric.
BSTT523: Pagano & Gavreau, Chapter 7 14 Example 7.4 Allergic reaction to anesthesia (Laake and Rottingen) Occurrences of reaction Poisson, about 12 incidents per year expected Q1. In the next year, what is the probability of seeing 3 incidents? Solution: X~POI(12) f(3) = e 12 12 3 3! =.00177 Q2. What is the probability that at least 3 will have a reaction in the next year? Solution 1: P(X 3) = 1 P(X 2) = 1 F(2) = 1 {f(0) + f(1) + f(2)} = 1 e 12 12 0 0! + e 12 12 1 1! + e 12 12 2 2! = 1.00052225 =.9994775
BSTT523: Pagano & Gavreau, Chapter 7 15 Solution 2: Table A.2 (P.A-6): POISSON PDF P(X 3) = 1 F(2) = 1 (. 0000 +.0001 +.0004) =.9995 Solution 3: SAS: POISSON(λ, x) CDF PDF( POISSON, x, λ) PDF CDF( POISSON, x, λ) CDF
BSTT523: Pagano & Gavreau, Chapter 7 16 3. Continuous Random Variables Continuous X can assume any value within its range. Within any interval, there are theoretically an infinite number of values. Subareas of histograms represent frequency of occurrence of values within class intervals Total frequency of values between a and b: add all subareas for intervals a through b. If width of class intervals is very small, then connecting midpoints (creating a frequency polygon) creates a smooth curve. If probability is shown on the y-axis and we have a smooth curve: probability density function (PDF) f(x) P(a < X b) = total area under f(x) between a and b, b a or f(t)dt.
BSTT523: Pagano & Gavreau, Chapter 7 17 Cumulative density function (CDF) of X: F(x) = x f(t)dt Note: Total area under f(x) = 1, i.e., + f(t)dt = 1 and f(x) = d F(x) = F (x) dx
BSTT523: Pagano & Gavreau, Chapter 7 18 4. A special continuous distribution: the Normal or Gaussian Normal PDF: f(x) = 1 (x μ) 2 2πσ e 2σ 2, < x < + X~N(μ, σ 2 ) Characteristics: Distribution is symmetric around μ Mean = Median = Mode = μ Total area under the curve = 1, i.e., (x μ) + 1 2 2πσ e 2σ 2 Area under the curve between σ and +σ.68 Area under the curve between 2σ and +2σ.95 Area under the curve between 3σ and +3σ.997 = 1 E(X) = μ Var(X) = σ 2 location parameter scale parameter Standard Normal Distribution: Z~N(0,1) has PDF φ(z) = 1 z 2 2π e 2, < z < +
BSTT523: Pagano & Gavreau, Chapter 7 19 Table A.3: Standard Normal Upper Tail Cumulative Probabilities P(Z z 0 ) = 1 Φ(z 0 ), z 0 0 where Φ(z) = z φ(t)dt is the CDF for Z for z 0 < 0, Φ(z 0 ) = P(Z z 0 ) = P(Z ( z 0 )), z 0 0 Example 7.5 Given a variable that follows the standard normal distribution, i.e. Z~N(0,1), what is P(z 1) and P(z 1)? Solution: by Table A.3, P(z 1)=0.159 and P(z 1) = P(z 1) = 0.159 Example 7.6 Randomly pick a value z from the standard normal distribution. P(z has a value between -2 and +2) =? Solution: Note that for a continuous distribution P(X = x) = 0. P( 2 z +2) = P( 2 < z < +2) = 1 P(z 2) P(z 2) = 1 2 P(z 2) = 1 2 (. 023) = 0.954
BSTT523: Pagano & Gavreau, Chapter 7 20 How is the N(0,1) distribution related to N(μ, σ 2 )? If X~N(μ, σ 2 ) and Z = (X μ) σ, then Z~N(0, 1). Example 7.7 Systolic Blood Pressure (SBP) (p.181 P&G) X = SBP for 18-74 year old males; X~N(μ, σ 2 ) with μ=129 mm Hg and σ=19.8 mm Hg. Find x which is the cutoff for the upper 2.5% of the SBP distribution; i.e. find x such that P(X > x) =.025. Solution: By Table A.3 we know that P(Z 1.96) =.025. (x μ) σ = 1.96 (x 129) 19.8 = 1.96 x = (1.96)(19.8) + 129 = 167.8 What proportion of men in this population have SBP>150 mmhg? Solution: P(X > 150) = P (x μ) σ > (150 129) 19.8 = P(Z > 1.06) = 0.145 14.5%
BSTT523: Pagano & Gavreau, Chapter 7 21 Example 7.8 Breath study (Diskin et al.) X = Ammonia concentration in parts per billion (ppb) μ=491 ppb, σ = 119 ppb; i.e. X~N(491, 119 2 ) P(292 X 649) =? Solution 1: P(292 X 649) = P 292 491 119 X μ σ 649 491 119 = P( 1.67 Z 1.33) = 1 P(Z 1.67) P(Z 1.33) = 1.047.092 =.861 Solution 2: SAS: ProbNorm(x) N(0,1) CDF PDF( NORMAL, x) N(0,1) PDF PDF( NORMAL, x, μ, σ) N(μ, σ) PDF CDF( NORMAL, x) N(0,1) CDF CDF( NORMAL, x, μ, σ) N(μ, σ) CDF