Bayes Theorem Bayes Theorem is a theorem of probability theory that can be seen as a way of understanding how the probability that a theory is true is affected by a new piece of evidence. It has been used in a wide variety of contexts, ranging from marine biology to the development of Bayesian spam blockers for email systems. In the philosophy of science, it has been used to try to clarify the relationship between theory and evidence. Many insights in the philosophy of science involving confirmation, falsification, the relation between science and pseudosience, and other topics can be made more precise, and sometimes extended or corrected, by using Bayes Theorem.
Definition Let A 1, A 2,, A k be a collection of k mutually exclusive events with prior probabilities P (A i )(i = 1,, k). Then for any other event B for which P (B) > 0, the posterior probability of A j given that B has occurred is P (A B) = P (B A)P (A) P (B A)P (A)
Bayes Example I Incidence of a rare disease. Only 1 in 1000 adults is afflicted with a rare disease for which a diagnostic test has been developed. The test is such that when an inidividual actually has the disease, a positive result will occur 99% of the time, whereas an individual without the disease will show a positive test result only 2% of the time. If a randomly selected individual is tested and the result is positive, what is the probability that the individual has the disease? Let A = individual has the disease, and B = positive test result P (A) = 1 1000 = 0.001, P (A ) = 1 0.001 = 0.999, P (B A) = 0.99 and P (B A ) = 0.02
Bayes Example II If a randomly selected individual is tested and the result is positive, what is the probability that the individual has the disease? P (A B) = P (B A)P (A) P (B A)P (A) + P (B A )P (A ) = P (A B) P (B) where P (B) = P (B A)P (A) + P (B A )P (A ) = (0.99)(0.001) (0.99)(0.001)+(0.02)(0.999) = 0.0472103
Sampling with replacement vs. without replacement Suppose we have a bowl of 100 unique numbers from 0 to 99. We want to select a random sample of numbers from the bowl. After we pick a number from the bowl, we can put the number aside or we can put it back into the bowl. If we put the number back in the bowl, it may be selected more than once; if we put it aside, it can selected only one time. When a population element can be selected more than one time, we are sampling with replacement. When a population element can be selected only one time, we are sampling without replacement. Sampling with replacement tends to give more extreme (variable) samples than without replacement.
Sampling with and without replacement Suppose we have a population as follows: 0, 1, 2, 3, 4 Take samples of size 2 with and without replacement: With with (con t) Without (0,0) (2,3) (0,1) (0,1) (2,4) (0,2) (0,2) (3,3) (0,3) (0,3) (3,4) (0,4) (0,4) (4,4) (1,2) (1,1) (1,3) (1,2) (1,4) (1,3) (2,3) (1,4) (2,4) (2,2) (3,4)
Models for Discrete Random Variables (rv) Each random variable can be considered as being obtained by probability sampling from its own sample space, and in doing so leads to a classification of sampling experiments, and the corresponding variables, into classes. Random variables within each class share a common traits and parameter estimations. These classes of probability distributions are also called probability models.
Bernoulli trials A Bernoulli trial or experiment is one whose outcome can be classified as either a success or failure. The Bernoulli random variable X takes the value 1 if the outcome is a success, 0 if it is a failure. Examples of Bernoulli trials: flipping a coin, product taken randomly from production line and classified as a success if the product is defective or a failure if it is not defective.
Bernoulli pdf CDF If the probability of success is p and failure is 1 p, the pmf and CDF of X are: x 0 1 p(x) 1 p p F (x) 1 p 1 The binomial distribution is an extension (in the same family of probability models) of the Bernoulli distribution. The binomial random variable is the number of k successes in the n Bernoulli trials.
The Hypergeometric Distribution The assumptions leading to the hypergeometric distribution are: 1. The population or set to be sampled consists of N elements (a finite population). 2. Each individual element can be characterized as a success(s) or a failure (F ) and there are M successes in the population. 3. A sample of n individuals is selected without replacement in such a way that each subset of size n is equally likely to be chosen. Let X be the number of successes in the sample.
Hypergeometric pmf P (X = x) = ( M )( N M ) x n x ( N n) V X = EX = Mn N ( ) N n Mn N 1 N SDX = V X ( 1 M ) N
Hyper Example I Five individuals from an animal population thought to be near extinction in a certain region have been caught, tagged, and released to mix into the population. After they have had an opportunity to mix, a random sample of 10 of these animals is selected. Let X = the number of tagged animals in the second sample. If there are actually 25 animals of this type in the region, find the following: 1. Probability that exactly 2 are tagged? (P (X = 2)) 2. Probability that at most 2 are tagged? (P (X 2)) 3. EX, V X, SDX 4. Suppose the population size N is not actually known, the value of x is observed and we can use that to estimate N. Suppose now M = 100, n = 40 and x = 16 ˆN = Mn x
Hyper Example II 1. P (X = 2) = (5 2)( 20 8 ) ( 25 10) = 0.3853755 2. P (X 2) = P (2)+P (1)+P (0) = (5 2)( 20 0.6988142 8 ) ( 25 10) + (5 1)( 20 9 ) ( 25 10) + (5 0)( 20 10) ( 25 10) =
Hyper Example III 3. EX, V X, SDX M=5; n=10; N=25 EX=M*n/N; VX=((N-n)/(N-1))*(M*n/N)*(1-M/N) SDX=sqrt(VX) EX; VX; SDX [1] 2 [1] 1 [1] 1 4. ˆN = Mn x = 100 40 16 = 250
R code Use dhyper() for the pmf calculation (single probabilities) or phyper() for CDF P(X<=x). dhyper(x,m,n,k) (and phyper(x,m,n,k,lower.tail=t)) where: x: argument of interest, x m: M, the number tagged in the population, the number of successes n: N-M, the number of N elements in the population minus the number tagged k: n, the sample size lower.tail=t: for use in phyper() only; logical, default is T, calculations will be P (X x) Use sum() with dhyper(x,m,n,k) to calculate probabilities of intervals
Previous example with full code M=5; n=10; N=25 # P(X=2), P(X<=2) dhyper(2,m,n-m,n) [1] 0.3853755 sum(dhyper(0:2,m,n-m,n)) [1] 0.6988142