ACMS Statistics for Life Sciences. Chapter 9: Introducing Probability

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability

Why Consider Probability? We re doing statistics here. Why should we bother with probability? As we will see, probability plays an important role in statistics.

An Example I In a (very) recent Gallup study on the role of religion on one s views on violence, we find the following statement: Results are based on face-to-face interviews with approximately 1,000 adults in each country, aged 15 and older, from 2008 through 2010. For results based on the total sample of adults, one can say with 95% confidence that the maximum margin of sampling error ranges from ±1.66 to ±5.8 percentage points. Source: gallup.com

An Example II What is meant by the claim that one can say with 95% confidence that the maximum margin of sampling error ranges from ±1.66 to ±5.8 percentage points? This means that the probability that the estimate from the samples comes within the given margin of error is 0.95.

Another Example Recall: A simple random sample of size 10 taken from this class means that every possible group of size 10 has an equal chance of being selected. What do we mean when we say a group has an equal chance of being selected? A class of size 50 has 10,272,278,170 possible samples of size 10.

What is Probability? This is a difficult philosophical question. Following the textbook, we will define probability in terms of the long run behavior of random phenomena. Why the long run behavior of random phenomena? Chance behavior is unpredictable in the short run but has a regular and predictable pattern in the long run.

Short Run vs. Long Run Short run vs. long run Suppose I toss I toss a a fair fair What will the (or (or unbiased) coin. outcome be? Suppose What will will I the toss the a fair (unbiased) You coin. can t know outcome be? be? for certain: the What will the outcome be? outcome is You You can t know unpredictable in for for certain: the the the short run. outcome is is unpredictable in in or or?? the the short run. run. or? Short run vs. long run Suppose I toss a fair (or unbiased) coin. One can t know for certain: the outcome is unpredictable in the short run. or

Short Run vs. Long Run ort run vs. long run However, if I toss the coin a sufficiently large number of times, the outcomes start to settle down. Two trials of 5000 tosses each: if I toss the ficiently ber of outcomes ettle down. of 5000 h:

Further Confirmation Buffon Kerrich Pearson Total Tosses 4,040 10,000 24,000 Heads 2,048 5,067 12,012 Proportion 0.5069 0.5067 0.5005 (These guys had too much time on their hands.)

Randomness and Probability A phenomenon is random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions. Always keep the example of the tosses of a coin in mind! The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions. As we toss the fair coin more and more, the proportion of the occurrence of heads gets closer and closer to 1/2.

Examples of randomness? The outcome of a coin toss. The time between emissions of particles by a radioactive source. The sexes of the next litter of lab rats. The outcome of a random sample of randomized experiment.

Probability Models 1 Probability Models 1 Let us study a certain random phenomenon, the birth of a child. Suppose we are studying a certain random phenomenon. Consider, for example, the birth of a child.

Probability Models 2 What will the outcome be? That is, will the child be male or female? We can t know (too far) in advance. Here s what we do know: 1. The outcome will be either male or female. 2. The probability of each outcome is (roughly) 1/2.

Probability Models 3 Thus we ve described: 1. A list of possible outcomes 2. A probability for each outcome. These correspond to the two components of a probability model. Before defining a probability model, we need a bit more terminology.

Probability Models 4 The sample space S of a random phenomenon is the set of all possible outcomes. An event is an outcome or set of outcomes of a random phenomenon. Thus, an event is a subset of the sample space. For example, if S = {1, 2, 3, 4, 5, 6, 7, 8, 9} is a set of outcomes, then E = {2, 4, 6, 8} is an event. Careful: An event need not be an individual outcome!

Finally... A probability model is the description of a random phenomenon consisting of 1. a sample space S, and 2. a way of assigning probabilities to events in S.

Examples of Sample Spaces S = {M,F} S = {Republican, Democrat, Independent } S = {weights of 1,000 individuals in a sample }

A Baby-friendly Example Suppose a couple plans to have three children. Let S be the number of girls they can possibly have. That is, S = {0, 1, 2, 3}. What is the probability of each outcome in S (assuming that the probability of a girl is 1/2)? Incorrect answer: Each outcome is equally likely, so each has probability 1/4.

The Possible Outcomes Possible outcomes with one child: {B, G} Possible outcomes with two children: {BB, BG, GB, GG} Possible outcomes with three children: {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG} Now, these outcomes are equally likely.

Calculating the Probabilities Probability of no girls? 1/8 BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG Probability of exactly one girl? 3/8 BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG Probability of exactly two girls? 3/8 BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG Probability of three girls? 1/8 BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG

Another Example: Blood Types Let S = {O+, O, A+, A, B+, B, AB+, AB }. If we choose an American at random, what is the probability that this person has, say, blood type O+? Where do we get these probabilities? These are the frequencies of occurrence of each blood type we get them by taking lots and lots of samples.

A Donation Problem Suppose that we need blood for someone with blood type AB. What is the probability that a randomly selected American has the right blood to donate? Individuals with blood type O, A, B and AB can donate to this person. So what we re looking for is the probability of the event E = {O, A, B, AB }. How do we find this probability?

General Rules of Probability: Rules 1 and 2 Rule 1: Every probability is a number between 0 and 1. That is, if A is any event in S, then 0 P(A) 1. What if P(A) = 0? When S is finite, then this means that A is impossible. What if P(A) = 1? When S is finite, then this means that A must occur. Rule 2: The event consisting of all outcomes in the sample space has probability 1.

General Rules of Probability: Rule 3 Rule 3: If two events have no outcomes in common, then the probability that one or the other occurs is the sum of their individual probabilities. When two events have no outcomes in common, we say that they are disjoint.

General Rules of Probability: Rule 3 (continued) Rule 3: If A and B are disjoint, then P(A or B) = P(A) + P(B). (This is sometimes call the addition rule.) In general, if A and B are any two events in S, then P(A or B) = P(A) + P(B) P(A and B).

General Rules of Probability: Rule 4 Rule 4: The probability that an event does not occur is 1 minus the probability that the event does occur. P(A does not occur) = 1 P(A)

Back to the Donation Problem The addition rule holds for more than just two disjoint events: P(O or A or B or AB ) = P(O ) + P(A ) + P(B ) + P(AB ) = 0.07 + 0.06 + 0.02 + 0.01 = 0.16 We also used the addition rule in the example with the couple having three children. Can you see where?

Discrete Probability Models So far, the probability models we ve considered are discrete probability models. A probability model is discrete if the sample space is made up of a list of individual outcomes (the first outcome, the second outcome, the third outcome,... ). To assign probabilities in a discrete model, we merely list the probabilities of all the individual outcomes.

Continuous Probability Models What kind of probability model should we use for continuous quantitative variables? These can take any number in a range of possible values. First try: Histograms! Heights (inches) of women age 40 49 in the U.S. (Ignore the curve...just look at the histogram)

Calculating Probabilities 1 On this graph the bins are intervals of 1 inch. What if we want to know the probability someone is within half an inch of 60 inches? P(59.5 X 60.5) =?

Calculating Probabilities 2 We could keep asking for probabilities of smaller and smaller intervals. But then there are an infinite number of possible events! This is a problem. Is there an easier way?

Continuous Distributions Solution: Use a curve to indicate the different outcomes, and let the probability of any given interval of values be the area under the curve. These curves are called density curves.

Density Curves All density curves have the following properties. The curve is always on or above the x-axis. The total area under the curve is equal to 1.

Continuous Probability Models: The Official Definition A continuous probability model gives a density curve and assigns the probability of every interval as the area under the curve for that interval.

A Warning about Density Curves No set of real data is exactly described by a density curve. The curve is a model. That is, the curve is an idealized description that is easy to use and accurate enough for practical use. Think of the density curve like you would the regression line: Least-squares regression models a linear trend, and is used to make predictions about similar individuals in the population.

Example: The Uniform Distribution on the Interval [a, b]

Example: Exponential Distributions

Finding Probabilities with a Density Curve Let s consider the uniform distribution on [0,1]: What is 1. P(X 0.5)? 2. P(X = 0.5)? 3. P(X < 0.5)? 4. P(X 0.5 or X > 0.8)? 1. The area under the curve for the region x 0.5 is a 0.5 1 rectangle, and so P(X 0.5) = 0.5. 2. The area under the curve for this region is a 0 1 rectangle, and so P(X = 0.5) = 0.

Finding Probabilities with a Density Curve Let s consider the uniform distribution on [0,1]: What is 1. P(X 0.5)? 2. P(X = 0.5)? 3. P(X < 0.5)? 4. P(X 0.5 or X > 0.8)? 3. Observe: P(X < 0.5) + P(X = 0.5) = P(X 0.5). Thus, P(X < 0.5) = P(X 0.5) = 0.5. 4. P(X 0.5 or X > 0.8) = P(X 0.5) + P(X > 0.8) = 0.5 + 0.2 = 0.7.

Another Warning! All continuous probability models assign probability 0 to any individual outcome. Only intervals of values have positive probability.

Measures of Center: Median and Mean Q 1 : What is the median of a density curve? A 1 : It is the point a such that half the area is to the left and half the area is to the right: P(X < a) = P(X > a) Q 2 : What is the mean of a density curve? A 1 : It is the point a such that the curve would balance if the area under the curve were cut from a block of wood and held at point a. For density curves, we use µ for the mean, and σ for the standard deviation. (Why? We will see in Chapter 13).

Comparing Median and Mean

Random Variables When we write P(X > 7), what exactly is X? X is a variable which represents the outcome of a random phenomenon. We call it a random variable. X may be any possible outcome. But the probability that X will be in any given interval is called a probability distribution. X may be either discrete or continuous. The X for number of daughters is discrete and finite. The X for heights of women is continuous.