BSTT523: Pagano & Gavreau, Chapter 6 1 Chapter 6: Probability slide: Definitions (6.1 in P&G) 2 Experiments; trials; probabilities Event operations 4 Intersection; Union; Complement Venn diagrams Conditional probability 7 Marginal probability 8 Bayes Theorem 9 Independent Events 9 Example: Employment status and hearing impairment 10 Example: Age at onset of bipolar disorder 11 Diagnostic tests (6.4 in P&G) 14 Sensitivity; Specificity; Predictive Value Examples: Cervical cancer in women 17 Tuberculosis and X-ray screen 18 Self-Reported BMI & Obesity 19 Relative Risks (RR) and Odds Ratios (OR) 20
BSTT523: Pagano & Gavreau, Chapter 6 2 Definitions Classical definitions: Experiment: the process of obtaining an observed result of some phenomenon Probability of A: N : Number of all possible results/outcomes A : Event with characteristic of interest m : Number of occurrences of A P(A) = m N Example 1: Randomly draw 1 card from a well shuffled deck of 52 cards N = 52; A = card is a heart ; m = 13 P(A) = 13 52 = 0.25 Example 2: Roll a fair six-sided die N = 6; A = get a 5 ; m = 1 P(A) = 1 6
BSTT523: Pagano & Gavreau, Chapter 6 3 Frequentist definitions: Trial: Performance of 1 experiment n : Number of independent trials E : Event with characteristic of interest m : Number of occurrences of E Probability of E: P(E) = relative frequency of E = m n Example: Flip a fair coin (heads H or tails T) 5 times. Observe H, T, T, T, H A = H; n = 5; m = 2 P(H) = 2 5 = 0.4
BSTT523: Pagano & Gavreau, Chapter 6 4 Event Operations: Intersection / AND Operator: A B Both A and B Joint probability: P(A B) Mutually exclusive events: Let ϕ denote a Null Event, i.e., P(ϕ)=0. Events A and B are mutually exclusive if A B = ϕ and P(A B)=0. Union / OR Operator: A B Either A or B or both Additive rule of probability: If A and B are mutually exclusive (P(A B)=0), then P(A B) = P(A) + P(B) If A and B are not mutually exclusive, then P(A B) = P(A) + P(B) - P(A B)
BSTT523: Pagano & Gavreau, Chapter 6 5 Venn Diagram for two overlapping events A and B: not A or B A,not B A & B B,not A The size of each area represents the magnitude of probability. P(A) P(B) P(A B) P(A B) A naïve (and incorrect) way to calculate P(A B) is P(A) + P(B); this method would count the overlapping area of P(A B) twice.
BSTT523: Pagano & Gavreau, Chapter 6 6 Complement: A c or A NOT A P(A A) = 0 P(A A) = 1 P(A) + P(A) = 1 P(A) = 1 - P(A)
BSTT523: Pagano & Gavreau, Chapter 6 7 Conditional Probability P(A given prior occurrence of B) = P(A conditional on B) Question: Does prior occurrence/condition B change P(A)? Definition of conditional probability: Conditional Probability P(A given prior occurrence of B) = P(A conditional on B) Question: Does prior occurrence/condition B change P(A)? Definition of conditional probability: P(A B) = P(A B) P(B) Multiplicative rule of probability: For any events A and B with P(A)>0 and P(B)>0, P(A B) = P(B)P(A B) = P(A)P(B A)
BSTT523: Pagano & Gavreau, Chapter 6 8 Marginal Probability Events A1,, Am are mutually exclusive and exhaustive (M.E.E.) if P(Ai Aj)=0 for all i j (mutually exclusive) P(A1)+ P(A2)+... + P(Am)=1 (exhaustive) Jointly occurring sets of events A and B A has m categories; events A1,, Am are M.E.E. B has n categories; events B1,, Bn are M.E.E. The marginal probability of Ai is P(A i ) = n j=1 P(B j ) P (A i B j ) The marginal probability of Bj is P(B j ) = m i=1 P(A i ) P(B j A i ) Law of Total Probability (P&G p.133)
BSTT523: Pagano & Gavreau, Chapter 6 9 Bayes Theorem If events A1,, Am are mutually exclusive and exhaustive, then P(A i B) = P(A i)p(b A i ) m P(A j )P(B A j ) j=1 Independent Events Events A and B are independent if P(A B) = P(A)P(B) Note: if P(A) > 0, P(B) > 0, and A and B are independent, then P(A B) = P(A) and P(B A) = P(B) Question: If two events are mutually exclusive, are they independent?
BSTT523: Pagano & Gavreau, Chapter 6 10 Example: Employment Status and Hearing Impairment (P&G p.132-3): B = Has a hearing impairment A1 = Currently employed A2 = Currently unemployed A3 = Not in the labor force P(A 1 ) =.6063, P(A 2 ) =.0457, P(A 3 ) =.3480 P(B A 1 ) =.0056, P(B A 2 ) =.0036, P(B A 3 ) =.0065 Apply the Law of Total Probability: P(B) = P(A 1 )P(B A 1 ) + P(A 2 )P(B A 2 ) + P(A 3 )P(B A 3 ) =.6063.0056 +.0457.0036 +.3480.0065 =.0059 Apply Bayes Theorem: P(A 1 B) = P(A 1 )P(B A 1 ) P(A 1 )P(B A 1 )+P(A 2 )P(B A 2 )+P(A 3 )P(B A 3 ) =.6063.0056.0059 =.583
BSTT523: Pagano & Gavreau, Chapter 6 11 Example: Age at onset of bipolar disorder and family history A random sample of subjects with bipolar disease Age at Onset: Family History: Early (E) (<=18 yrs) Later (E ) (>18 yrs) Total: Negative (A) 28 35 63 Bipolar (B) 19 38 57 Unipolar (C) 41 44 85 Unipolar & Bipolar (D) 53 60 113 Total 141 177 318 Q1. E = Early onset of bipolar disorder. P(E) =? Solution: P(E) = 141 318 = 0.4434 (marginal probability) Q2. If you select one subject at random, what is the probability they had early onset and no family history? Solution: P(E A) = 28 318 = 0.0881 (joint probability) Q3. Which is an example of two mutually exclusive events? i) No family history (A) and early onset (E) ii) No family history (A) and family history of bipolar (B) Solution: i) P(A E) > 0 No. ii) P(A B) = 0 Yes.
BSTT523: Pagano & Gavreau, Chapter 6 12 Q4. Select one subject at random. What is the probability of early onset or no family history (or both)? Solution: E and A are not M.E., so use formula P(E A) = P(E) + P(A) P(E A). P(E) = 141 318 =.1981 P(A) = 63 318 =.4434 P(E A) = 28 318 =.0881 P(E A) = P(E) + P(A) P(E A) =.1981 +.4434 -.0881 =.5534 Q5. For a randomly select subject, what is the probability of no family history or family history of bipolar only? Solution: events A and B are mutually exclusive. P(A B) = P(A) + P(B) = 63 318 + 57 318 =.3774
BSTT523: Pagano & Gavreau, Chapter 6 13 Q6. What is the probability of some kind of family history (bipolar or unipolar or both)? Solution: P(B C D) = P(A) = 1 P(A) = 1.1981 =.8091 Q7. Randomly select a subject who had early onset. What is the probability that they have no family history? Or, what is the probability of no family history given early onset? (Note: we are no longer interested in the whole n=318; we are only interested in the subset with early onset. Solution: P(A E) = 28 141 =.1986 or, P(A E) = P(A E) P(E) =.0881.4434 =.1987 Note: another way to derive P(E A) is P(E A) = P(A E)P(E) = (. 1987). 4434) =.0881
BSTT523: Pagano & Gavreau, Chapter 6 14 Q8. Are early onset (E) and no family history (A) independent? Solution: they are independent if P(E A) = P(E)P(A). P(E) = 63 318 =.1981 P(A) = 141 318 =.4434 P(E A) = 28 318 =.0881 P(E)P(A) = (. 1981)(. 4434) =.0878 Not exactly! Illustration of the Law of Total Probability: What is the marginal probability of early onset (E)? Average across M.E.E. events A, B, C, D. P(E) = P(E A) + P(E B) + P(E C) + P(E D) =.0881 +.0597 +.1289 +.1667 =.4434
BSTT523: Pagano & Gavreau, Chapter 6 15 Diagnostic Tests Disease Status Test Result Present (D) Absent (D) Total Positive (T + ) a b a + b Negative (T ) c d c + d Total a + c b + d a + b + c + d False Positive event: False Negative event: Sensitivity: P(T + D) = T + D T D a a+c probability of a positive test result, given that disease is present Specificity: P(T D) = d b+d probability of a negative test result, given that the disease is absent Predictive Value Positive: P(D T + ) = a a+b probability that disease is present, given a positive test result Predictive Value Negative: P(D T ) = d c+d probability that disease is absent, given a negative test result
BSTT523: Pagano & Gavreau, Chapter 6 16 Application of Bayes Theorem: Predictive Value Positive P(D T + ) = P(T + D)P(D) P(T + D)P(D)+P(T + D)P(D) Let S e = Sensitivity = P(T + D) S p = Specificity = P(T D) = 1-P(T + D) p = Prevalence of disease = P(D) P(D T + ) = (S e )(p) (S e )(p)+(1 S p )(1 p) Predictive Value Negative P(D T ) = P(T D)P(D) P(T D)P(D)+P(T D)P(D) = (S p )(1 p) (S p )(1 p)+(1 S e )(p)
BSTT523: Pagano & Gavreau, Chapter 6 17 Example: Cervical Cancer in Women (P&G p. 136-7) D T cervical cancer pap smear test for cervical cancer False negative rate = 16.25% False positive rate = 18.64% Cervical cancer prevalence = 8.3 per 100,000 Q. What is the Predictive Value? P(T D) =.1625 S e = P(T + D) = 1.1625 =.8375 P(T + D) =.1864 S p = P(T + D) = 1.1864 =.8136 p = 8.3 100,000 =.000083 P(D T + ) = (S e )(p) (S e )(p)+(1 S p )(1 p) = (.8375)(.000083) (.8375)(.000083)+(.1864)(.999917) =.000373 P(D T ) = (S p )(1 p) (S p )(1 p)+(1 S e )(p) = (.8136)(.999917) (.8136)(.999917)+(.1625)(.000083) =.999983
BSTT523: Pagano & Gavreau, Chapter 6 18 Example: Tuberculosis and X-ray screen (P&G p.138-9) Study sample of N=1,820 X-ray test results; 30 tuberculosis cases Tuberculosis X-ray result No (D) Yes (D) Total Negative (T ) 1739 8 1747 Positive (T + ) 51 22 73 Total 1790 30 1820 Note: Tuberculosis prevalence is known to be 9.3 per 100,000. The proportion of cases in this sample, 30 1820 =.0165, should not be used for p; instead use p =.000093. S e = P(T + D) = 22 30 =.7333 S p = P(T D) = 1739 1790 =.9715 P(D T + ) = (S e )(p) (S e )(p)+(1 S p )(1 p) = (.7333)(.000093) (.7333)(.000093)+(.0285)(.999907) =.002387 P(D T ) = (S p )(1 p) (S p )(1 p)+(1 S e )(p) = (.9715)(.999907) (.9715)(.999907)+(.2667)(.000093) =.999974
BSTT523: Pagano & Gavreau, Chapter 6 19 Example: Data is from a community-based epidemiologic survey. According to selfreported According to measured BMI: BMI: Not Obese Obese TOTAL Not Obese 6305 282 6587 Obese 59 809 868 TOTAL 6364 1091 7455 Prevalence of obesity = 1091/7455 = 0.1463 S e = P(T + D) = 809 1091 =.74, S p = P(T D) = 6305 6364 =.99 P(D T + ) = 809 868 =.93, P(D T ) = 6305 6587 =.96
BSTT523: Pagano & Gavreau, Chapter 6 20 Relative Risks and Odds Ratios We want to compare proportions with disease D between Group 1 and Group 2. p 1 = P(D Group 1) = Risk of disease in Group 1 p 2 = P(D Group 2) = Risk of disease in Group 2 Odds1 = p 1 1 p 1, Odds2 = p 2 1 p 2 Measures of disparity in risk between groups: Risk Difference = p 2 p 1 Relative Risk (RR) = Odds Ratio (OR) = p 2 p 1 ( p 2 1 p2 ) ( p 1 1 p1 ) Low prevalence the OR is a close approximation to the RR. More later! - tests comparing two proportions (this class) - logistic regression is based on the OR (categorical data class)