On a horizontal number line, list all possible values of x; in this case: Chapter 3. Section 3.1: The probability histogram

Size: px

Start display at page:

Download "On a horizontal number line, list all possible values of x; in this case: Chapter 3. Section 3.1: The probability histogram"

Eric Goodman
6 years ago
Views:

1 Chapter 3 Section 3.1: The probability histogram Below is the probability histogram for the IS (p. 80 of text): On a horizontal number line, list all possible values of x; in this case: 0.90, 0.70, 0.50, 0.30, 0.10, 0.10, 0.30, 0.50, 0.70, Draw rectangles: Each x serves as the center of the base of its rectangle The base of each rectangle equals δ (this means the rectangles touch, but do not overlap). The height of the rectangle centered at x is: P(X = x)/δ How is it determined? Below is the verification of two heights given in our picture: P(X = 0.10)/0.20 = /0.20 = P(X = 0.30)/0.20 = /0.20 = Why this strange definition of height? B/c we want to consider areas. The area of the triangle centered at x is, of course, its base times its height: δ[p(x = x)/δ] = P(X = x). Here is the main thing to remember: In a probability histogram, the area of a rectangle equals the probability of its center value. Once we know this, we can see the symmetry in the sampling distribution for the IS. We are now going to add another adjective (remember actual?). The sampling distribution of Chapter 2 will be called the exact sampling distribution and it yields the exact P-value. We say this b/c in Chapter 3 we will learn two aways to approximate a sampling distribution: computer simulation and fancy math Section 3.2: Computer simulation In the text, I talk about a Colloquium Study (CQS). Its data are below: Treat. S F Total Total It can be shown that there are 40,116,600 possible assignments, and that 12,932,920 of these give x = 0 (a = 4). Thus, P(X = 0) =12,932,920/40,116,600= The idea of the computer simulation approximation is quite simple: Perhaps we can obtain a good approximation to any probability by looking at some, not all of the assignments. + 84

2 For example, I looked at 10,000 assignments for the CQS and discovered that 3267 of them gave a = 4 and x = 0. Thus, the relative frequency (RF) of occurrence of 0 is , which is very close to its probability, But, I am ahead of myself. Once we decide to look at only some of the assignments, two questions arise. 1. How many should we look at? The answer is called the number of runs of the computer simulation. As we shall see, 10,000 is a good choice for the number of runs. 2. Which ones should we look at? Well, to avoid bias, we select assignments at random. If you want to see more details on this, read pages 84 and 85 in the text. But you don t need to understand this Below are the results of my computer simulation study with 10,000 runs for the CQS: x RF Probability 4/ / / / / / / / Total First, note that the RFs and probabilities are close. (Remember Section 2.4) We can use the RFs to approximate the P- value. The ingredients: the actual x = 3/7, the alternative was > Thus, the exact P-value is P(X 3/7) = = We can approximate this by RF (X 3/7) = = The picture below is taken from p. 87 of the text Runs 2/7 1/7 4/ Runs Results of a simulation experiment with 10,000 runs for the Ballerina study: Rel. Freq. Rel. Freq. Rel. Freq. x of x of x of x /7 0 3/7 10,000 Runs Recall, the P-value is P(X 0.24) = We approximate this with RF(X 0.24) = /7 1/7 2/

3 Results of a simulation experiment with 10,000 runs for the Crohn s study: Rel. Freq. Rel. Freq. Rel. Freq. x of x of x of x Recall, the P-value = P(X 0.27) = , which we approximate by RF(X 0.27) = Section 3.3: Center and Spread Page 89 of the text shows probability histograms for four studies in the text. In each picture, if you sum the areas of the teal colored rectangles, you get the P-value. There are two facts (for all FTs) that are revealed in these pictures. Each picture has one central peak. The peak can be one or two rectangles wide, but never three. As you move away from the central peak in either direction, the rectangles become shorter and shorter. It is useful to have a concept of a left to right center for such a picture Clearly, if the picture is symmetric, then its center is 0. To include all pictures, we define the center of the PH to be its center of gravity. It can be shown that for every FT, the center of gravity is 0. In general, the center of gravity, or mean, of a PH is denoted by the Greek letter mu: µ. For FT, µ = 0. (In Chapter 5, we will have pictures for which µ 0.) Thus, all FTs are similar in that they all have µ = 0. But the pictures on page 89 look very different. This is b/c they have different amounts of spread. For the four pictures, IS has the most spread, then CQS, then Soccer and finally CCD has the least spread. (This is an obvious visual assessment.) + 91 We need more than a visual assessment of spread. We need a number that summarizes the spread in a PH. The number is the standard deviation of the PH, denoted by the Greek letter sigma: σ. There is a simple formula for calculating σ: m1 m σ = 2 n 1 n 2 (n 1). Below are the standard deviations for the four pictures on page 89. (See text for details.) Study: IS CQS Soccer CCD σ: Note that more spread corresponds to a larger σ. Why do we want to measure spread? Be patient please. + 92

4 Recall that X is the test statistic for FT. It is also called a random variable. Let X be any random variable, with mean µ and standard deviation σ. Define the standardized version of X to be Z, where Z = X µ. σ Transforming X to Z is called standardizing X. The observed value of Z is denoted by z and is computed by: z = (x µ)/σ. For FT, b/c µ = 0: Z = X/σ and z = x/σ. Data X x Z z + 93 In Chapter 2, we learned about the sampling distribution for X. In a similar way, Z has a sampling distribution. Assuming we have the sampling distribution for X, it is very easy to get the sampling distribution for Z. We will illustrate with the IS. Recall for the IS: σ = and the possible values of x are: 0.90, ,0.90. Thus, z = x/ and we get all possible values of z by taking every possible value of x and dividing it by Namely, 0.90/ = 3.94, 0.70/ = / = The probabilities for the z s are automatic. For example, the event that z = 3.94 is the same event as x = 0.90, so they have the same probability. To summarize, given the sampling distribution for X, it is easy to get the sampling distribution for Z. You don t need to worry about reproducing these details; this result is simply to motivate what happens next. B/c Z has a sampling distribution, we can draw its probability histogram and I have done so on page 94 of the text. (Ignore the teal color and smooth curve on p. 94.) Compare the pictures for the IS on pages 89 and 94 of the text: These pictures have the same shape and both are centered at 0. They differ in their spreads I repeated the above procedure (standardizing) for the other three probability histograms on p. 89. The details will not be given. The pictures I get for the three Z s are on pages 95 and 96. As with the IS, if you compare each picture for X on page 89 with its picture for Z, you will find that the two pictures have the same shape and both are centered at 0, but they have different spreads. Let s go back to the picture for Z for the IS on page 94. B/c Z is equivalent to X, we may use X or Z to find the P-value. To make this precise: P(X 0.30) = P(Z 1.31) and P(X 0.30) = P(Z 1.31). + 96

5 Thus, the area of the teal-shaded rectangles on page 94 is the P-value for the IS. Similarly, the area of the teal-shaded rectangles is the P-value for the three pictures on pages 95 and 96. Now, look at the four pictures on pages They look very similar. Why? B/c: the mean of each picture is 0 and the standard deviation of each picture is 1. This is what standardizing does to a picture: It creates a new picture which has the same shape as the old picture. In addition, the new picture is centered at 0 and is scaled to have a standard deviation of 1. Working with X, different data sets give very different pictures (see page 89), but working with Z, different data sets give very similar pictures. But why do we desire similar pictures? This is where the smooth curve enters the argument. B/c the Z pictures are similar, we can use one curve to as an approximation to each of them. A B C 1.31 Exact Area = B + C Approximate Area = A + B + 97 If A = C, then the approx. is perfect The smooth curve is called the standard normal curve (snc). The approximation method motivated above is useful only if it is easy to find areas under the snc Two facts: The snc is symmetric around 0 and its total area is 1. Suppose that I want to find the area under the snc to the right of, say, z = Good news: The table in the front of the book is designed to answer this question. First, take the z and divide it into two pieces: 1.3 and Then go to the table as shown below. z

6 For another example, suppose we want the area under the snc to the right of z = Break z into 1.5 and z Suppose we want the area under the snc to the right of z = The difficulty is: What to do with the extra digits? In my class, just round-off the z to 1.47 and proceed as above. Finally, suppose we want the area under the snc to the right of z = The difficulty is that the table goes no higher than But read the 3.5 row in the table (does it remind you of the movie The Shining? ). Fact: For any z > 3.59, the area under the snc to the right of z is What is the area under the snc to the left of z = 1.27? Thinking like the MITK, use symmetry to realize that the area to the left of z = 1.27 equals the area to the right of z = +1.27, and we know how to find this latter area (it is ). Fact: The area under the snc to the left of any z equals the area under the snc to the right of z. Now we are ready to use the snc to obtain an approximate P-value for FT. Consider again the CCD. Recall that the exact P-value is and, symbolically, is P(X ). Also, recall that for the CCD, σ = Thus, P(X ) = P(X/σ /0.1193) = P(Z 2.27) Look at the picture on p. 96 of the text. This P-value is the area of the rectangle centered at 2.27 plus the areas of all the rectangles to the right of this rectangle. (Only one of these rectangles can be seen on p. 96). It can be shown (more on this later) that the rectangle centered at 2.27 has for its endpoints: 2.04 and 2.51 (these look funny b/c of round-off error). The snc approximation to this P-value should be the area under the snc to the right of 2.04, which is This is a pretty good approximation, almost as good as we obtained with computer simulation (that was ). BUT now I must tell you something strange about the rest of the world, outside this classroom. Every other book I have seen says, It s too much work to find the 2.04 above, let s just calculate the area to the right of z = 2.27! I am NOT making this up! The area to the right of 2.27 is , which is a horrible approximation of To summarize, every other book I have seen, after spending all of Chapter 2 convincing you that the P-value is important, gets to Chapter 3 and says, It is too much trouble to approximate it accurately. For now, let s focus on what all the lazy people do. Whereas the method gives a lousy answer, it is very easy to use. 1. Calculate z = x/σ. 2. For the alternative >: The approximate P-value is the area under the snc to the right of z

7 For the alternative < the exact P-value is P(X x) = P(Z z), where z = x/σ. Thus, the lazy approximation is the area under the snc to the left of z, which is equal to the area under the snc to the right of z. Finally, for the alternative, we know that the P-value has two pieces to it. The approximation is: Twice the area under the snc to the right of z. This three-part rule (one for each alternative) is presented on p. 102 of the text. Its main advantage is that we calculate the same z regardless of alternative and then look up either z, z or z in the table. Remembering to double the area we find for the alternative. Its main disadvantage is that it often gives horribly inaccurate approximations Of course, nobody else calls this the lazy method. They call it the method. Mostly, they don t acknowledge the existence of an exact P-value, thus avoiding the issue that their approximation is bad. If anything, they call the lazy method the approximation without the continuity correction and my better method the approximation with the continuity correction. There is an additional bit of perversity going on: the approximation with the continuity correction given in all books that give it is actually a very bad correction and gives answers barely different from the without. This helps them discount people like me who want to improve on their answers: by presenting an ineffective improvement, they send the message that improvements are not needed! In their defense it is true that for very large studies the continuity correction does not change the answer much Thus, one can reasonably take the following approach: For small studies use the website to get the exact P-value and for larger studies if you use the snc, it does not matter much whether you use the continuity correction. (BTW, I am convinced that after some threshold for being a large study the website that we use actually uses an snc approximation.) When I have managed to challenge, in person, somebody who teaches the lazy method they always say, Oh, we only use the snc when the study is too large to get the exact P-value. But they taught the lazy method before there were computers readily available to obtain the exact answer. Now I am going to, perhaps, surprise you. On our homework and especially on the midterm, I encourage you to use the lazy method. (Discuss.) If you submit project 1 (A or B) you will need to use the approximation with the continuity correction. Its steps are outlined on page 3 of the Course Notes on the course webpage. (Discuss projects briefly.) Consider BSS. Recall that x = 0.24, the alternative is <, the exact P-value is Also, σ = 38(12) 25(25)(49) = Following page 3 of the Course Notes, g = δ/2 = 50/[2(25)(25)] = Thus, x 2 = x + g = = 0.20, which standardized gives z 2 = 0.20/ = We look up z 2 = 1.64 and find that the approximate P-value is W/o the c/c, z = 0.24/ = 1.97, giving an approximate P-value of

8 Let s look at CCD again. As I mentioned earlier in lecture, b/c of the nasty numbers for n s we need to be more precise in our computations. In particular, ˆp 1 = 22/37 = , ˆp 2 = 11/34 = , x = = , g = δ/2 = 71/[2(37)(34)] = Thus, x 1 = x g = = and z 1 = / = 2.04 Finally, consider IS as an example of. Recall that δ = 0.20, so g = First, we compare x to g. If x g then the exact P-value is 1 and no approximation is needed. In this case, x = 0.30, so we must continue. Next, x 3 = x g = = As stated earlier, σ = , making z 3 = 0.20/ = The area to the right of 0.88 is ; doubling this, we get as the approximation to the exact W/o the c/c, z = 0.30/ = 1.31 giving an approximation of 2(0.0951) = One somewhat positive comment about the lazy method. If all the margins are really large, w/ and w/o the c/c give about the same answers. For example, take n 1 = n 2 = 5000, m 1 = 4000 and x = One can verify σ = and g = With the c/c, z = / = 2.03 and w/o it, z = 0.02/ = But this is a huge study! Chapter 5 We spent Chapters 2 and 3 examining the Skeptic s argument. The Skeptic makes no attempt to (formally) extend conclusions beyond the subjects in the study. For example, we concluded that cyclosporine was superior to placebo for the 71 people in the study. We concluded that Julie was better spinning right than left for the 50 trials in her study. Typically, for better or worse, researchers want to extend their conclusions to a larger situation. There are many techniques for such extensions, but central to every one is the notion of a population. For our purposes, there are two types of populations: finite and infinite. And the word infinite here means it is not what we call a finite population. (?) A finite population is a well-defined collection of individuals. Examples: All persons who will vote in this year s presidential election; all persons eligible to vote in this year s presidential election; all persons in this room; all persons enrolled for one or more credits this semester at UW- Madison We need a way to think about a finite population. Imagine a box of cards, called the population box. Each member of the population has a card in the box. On the member s card are the values of one or more variables, or features, of the member. (Same features for all members.) For simplicity, we begin with one dichotomous feature per card. As in Chapter 1, the possible values of the feature are labeled success and failure. A 1 on a card denotes that that member is a success for the feature, and a 0 denotes a failure. Thus, every card has a 1 or a 0 on it. Statisticians paraphrase Snoopy who once famously said: I love mankind, it s people I can t stand! That is, statisticians are interested in the box in totality, not in any particular member s card

9 For a given population box: Let s denote the number of cards in the box marked 1. Let f denote the number of cards in the box marked 0. Let N = s+f denote the total number of cards in the box. Let p = s/n denote the proportion of cards in the box marked 1. Let q = f/n denote the proportion of cards in the box marked These five numbers, s, f, N, p and q tell us what is in the box. And, of course, knowledge of two of these numbers, N and p, allow us to determine the others. As a result, I will describe a box as Box(N; p). For example, Box(10;0.60) is a box with N = 10 cards, of which a proportion p = 0.60 of them are successes. In practice, a researcher does not know p and often does not even know N. There are two ways for a researcher to learn about what is in the population box. First, is a census; this means one examines every card in the box. Discuss. Second, is a survey; this means one examines only some of the cards in the box. Discuss. We will focus on surveys. The cards actually examined in a survey comprise the sample A sample is called representative if it looks like the box. Every (honest) researcher wants a representative sample, but, alas, there is no way to guarantee getting one. Let s consider this idea of representative again. Suppose that two researchers, A and B, each select samples of size n = 5 from Box(N;0.60). Below are their samples: A: 1, 1, 0, 1, 1 B: 0, 1, 1, 0, 1. Which sample is representative? B s b/c its ˆp = p. Discuss Bill Clinton s cabinet. We cannot guarantee a representative sample, so we advocate selecting what is called a random sample. A random sample (much like randomization) is a process and does not guarantee a sample that is representative, or necessarily even close to representative Its great virture is that a random sample allows us to calculate the probability that we will obtain a sample that is close to representative. As stated earlier, in practice, a researcher does not know p and often does not even know N. But, for now, let s assume that we know both of these numbers. Consider the chance mechanism of selecting n cards at random from Box(N; p). Imagine that we select the cards one-at-atime. But once we think of selecting the cards oneby-one, two ways of sampling come to mind: Without replacement (smart), and With replacement (dumb). Why do I label these smart and dumb? Discuss

10 Two probability histograms for X, the number of successes in a sample of size n = 10 from Box(1000;0.60) Solid [Dashed] rectangles are for a random sample with [without] replacement As above, but from Box(20;0.60) Note that in the first of these pictures, probabilities are the same for both methods of sampling, but for the second picture the probabilities are quite different. Also, note that the probabilities are better for the smarter method of sampling. Finally, note that for the dumb way of sampling, the probabilities do not depend on the value of N. The key is the ratio, n/n of sample size to population size. In the first example, this ratio is 10/1000 = 0.01 and in the second example, 10/20 = The general guideline is: If n/n 0.05, then probabilities calculated w/replacement are approximately equal to probabilities calculated w/o/replacement Why does any of this matter? Well, it turns out that it is much easier, both computationally and theoretically, to work with the probabilities for the dumb way of sampling. Thus, regardless of how you select a random sample, provided n/n 0.05, it is valid to calculate probabilities the easy way. Extended enrichment example: It is sometimes better to sample the dumb way. Recall the CCD study. There are N = possible assignments of subjects to treatments. Let each possible assignment be a pop. member and as a group they form our (very very large) finite population. Define an assignment to be a success if it would yield x (Remember, the actual x = 0.27.) Thus, the p for this population box equals , the P-value for the FT. Our computer simulation experiment selected a sample of n =10,000 assignments in the dumb way. But it would be very difficult to to write a computer program to select assignments the smart way (and it would require lots of memory and would run slowly). And, as our above result shows, the smart way and dumb way of sampling give the same answers. We will now investigate computing probabilities for the dumb way of sampling (with replacement). Recall, that we plan to (probabilities are always about the future) select n cards at random with replacement from the population box. Define X 1 to be the number on the first card selected (0 or 1); X 2 to be the number on the second card selected;...; and X n to be the number on the nth (last) card selected

11 Let s begin by computing probabilities for X 1. Clearly, P(X 1 = 1) = s/n = p and P(X 1 = 0) = f/n = q. We can present these equations in a table: Value Probability 0 q 1 p This table presents the sampling distribution for X 1. Upon reflection, this table is the sampling distribution for X 2, X 3,..., and X n. We summarize by saying that X 1, X 2, X 3,..., and X n are identically distributed. So, we can calculate probabilities for any individual card Next, we consider two cards simultaneously. For example, P(X 1 = 1 and X 2 = 0) = P(X 1 = 1, X 2 = 0). The result is the multiplication rule for probabilities. In this example, the multiplication rule says: P(X 1 = 1, X 2 = 0) = P(X 1 = 1)P(X 2 = 0) = pq. In words, we replace the word and by the operation of multiplying. (Similarly, recall, the addition rule replaced or by adding.) My argument for justifying the multiplication rule is a little tricky, so I will give it for a specific example, p = For this p, P(X 1 = 1, X 2 = 0) = pq = 0.60(0.40) = A brute force argument is given on pages of the text. The argument below appeals to the long-run interpretation of probability given in Section Consider the chance mechanism of selecting two cards at random w/replacement from the box. Now, imagine operating this chance mechanism a large number of times. We know that in the long-run approximately 60% of the operations will give a first card of 1. And of those operations that first give a 1, approximately 40% will give a second card of 0. Thus, in the long-run, 40% of 60% of the operations will give a 1 followed by a 0.. Next, remember that a percent of a percent is computed by converting to decimals and multiplying: 0.40(0.60) = qp = The multiplication rule can be extended in two directions. First, it is true for any two cards, not just the first two. Thus, for example, P(X 3 = 1, X 7 = 1) = pp = p 2. Second, it is true for more than two cards, for example, P(X 3 = 1, X 6 = 0, X 7 = 1) = pqp = p 2 q Do you recognize 4!? This is read 4-factorial (you don t need to shout) and it is calculated by: 4! = 4(3)(2)(1) = 24. Similarly, 3! = 3(2)(1) = 6 and 5! = 5(4)(3)(2)(1) = 120. These guys get big fast; for example, 50! = By special definition, 0! = 1. Now, define X to be X 1 + X X n. Literally, X is the sum of the numbers on the n cards selected. But b/c each card has a 1 or a 0,, X can be interpreted as the total number of successes in the sample. The variable X is very important in scientific applications. Thus, we would like to know its sampling distribution. Fortunately, there is a simple (?) formula for it, given on page 159 of the text: + 124

12 P(X = x) = n! x!(n x)! px q n x, for x = 0,1,..., n. This is a pretty amazing formula. It works for any choice of n and any value of p. It is called the Binomial sampling distribution with parameters n and p, written Bin(n, p). In this class, you need to be able to evaluate this formula with a hand calculator for n 6. For example, suppose we select n = 5 cards at random with replacement from a box with p = What is the probability we will get a representative sample? First, we realize that ˆp = x/5 will equal p = 0.60 if, and only if, x = 3. Thus, we want to calculate P(X = 3): P(X = 3) = 5! 3!(5 3)! (0.60)3 (0.40) 5 3 = (0.216)(0.16) = (2) For another example, for the same n and box, let s calculate P(X = 5). First, note that we can calculate this by using the multiplication rule. The event (X = 5) means that every card is a 1. Thus, P(X 1 = 1, X 2 = 1, X 3 = 1, X 4 = 1, X 5 = 1) = p 5 = (0.60) 5 = Using the binomial formula we get: P(X = 5) = 5! 5!(5 5)! (0.60)5 (0.40) 5 5 = 120 (0.0778)(1) = (1) Note that we need the definition 0! = 1 so that the formula works There are many statistical software packages that will compute binomial probabilities for us. For example, in the text on page 161 I give the computer-generated probabilities for Bin(25,0.50). Once we have binomial probabilities, we can draw probability histograms for binomials. Several such pictures are given on pages of the text. These pictures illustrate the following facts about the binomial. *δ = 1; thus, the height of each rectangle equals the probability of its center. *Just like the pictures for FT, there is one central peak one or two but never more rectangles wide and the probabilities steadily decrease as you move away from the peak. *The ph is symmetric, if and only if, p = *For p 0.50, the ph is roughly symmetric provided that both np and nq are large. People disagree on what large means; most say 5 or 10 or 15. Yes, we can use a computer to calculate binomial probabilities, but a computer should not be seen as a panacea. For example, if I try to use my software package for Bin(130,0.50) I get an error message. Unless a program is written with extreme care, the accurate calculation of, say, n! for large n is difficult. Thus, we might want to find a way to approximate binomial probabilities. Given the similarities of the binomial ph to the ph for FT, it is appealing to use the snc as an approximation. To this end, note that it can be shown that for Bin(n, p), µ = np and σ = npq. Thus, for example, Bin(100,0.50) has: µ = 100(0.50) = 50 and σ = 100(0.50)(0.50) = 25 = 5. As a result, it is easy to standardize X: Z = X np npq

13 On pages of the text are probability histograms for three Z s, each with an snc for comparison. Visually, it is clear that the snc can give good approximations for Z, and hence for X. The details will not be given and you are not responsible for them. This approximation will be used in Chapter 6. Section 5.3: Bernoulli Trials Consider the following experiments: Julie spins in circles to the right; Clyde shoots free throws; and Bob repeatedly tosses a coin. In each experiment, a person is conducting a sequence of trials. Consider the following question: Suppose that on Monday, Clyde attempts 100 free throws and achieves 77 successes. Clyde plans to attempt 200 free throws on Tuesday What is the probability that he will make 150 or more free throws on Tuesday? In order to answer this and similar questions, we need a mathematical model for the process that generates the results of the trials, whatever that means. We begin with a simple sequence of trials: repeated tosses of a fair coin. Think about this for a minute. What does it mean to you when you read, Repeated tosses of a fair coin? In particular, I want us to write down mathematical assumptions that describe this notion. There are three assumptions needed Each toss results in one of two outcomes: a heads or a tails. 2. The probability of heads is 0.50 for every toss. 3. The tosses exhibit no memory. We have studied the chance mechanism of selecting cards at random, with replacement, from a population box. For this CM, we learned two very useful techniques: the multiplication rule for a particular sequence of outcomes and the binomial sampling distribution for the total number of successes. Is there any relationship between the assumptions above and selecting cards from a box? Well, imagine a box with two cards, one card marked 1 for heads and the other marked 0 for tails. Suppose that we select cards at random with replacement from this box. I claim that this selection of cards from this box satisfies the three assumptions given above. Discuss. Thus, we can perform the following computations: If I toss a fair coin four times, the probability I get all heads is: P(H, H, H, H) = (0.50) 4 = If I toss a fair coin eight times, the probability that I get a total of exactly six heads is: 8! 6!2! (0.50)8 = Next, we generalize the above assumptions

14 Suppose that we have a sequence of trials. If they satisfy the following three assumptions, then we say that we have Bernoulli Trials (BT). 1. Each trial results in one of two outcomes: a success or a failure. 2. The probability of success equals p for every trial. 3. The trials exhibit no memory. we can calculate the following probabilities. If Katie shoots three free throws, the probability she makes all three is: P(S, S, S) = (0.85) 3 = If Katie shoots ten free throws, the probability she makes a total of exactly nine is: 10! 9!1! (0.85)9 (0.15) = To summarize if we are told that we have BT and we are told the value of p, then we can calculate probabilities about the outcomes of the trials. As argued above for a coin, BT are mathematically equivalent to selecting cards at random with replacement from Box(N; p). For example, Katie is a very good free throw shooter. On the assumption that Katie s free throws are BT with p = 0.85, What are the difficulties with this? Well, education should be more than learning to obey authority figures! I acknowledge this above by prefacing my computation by saying, On the assumption... are BT, but we can (sometimes) do more If we have previous outcomes of the trials, we can use these data to investigate (not determine!) whether the assumptions of BT seem reasonable. I offer two ways to investigate. One way is designed to explore the second assumption (constancy of success probability) and the other focuses on the third assumption (lack of memory). To be precise, suppose we have observed the following results of n = 10 trials: We investigate constancy by creating the following table: Half S F Total ˆp Total We need to be careful. We want to know whether p remains constant, but we never get to see p. We can see the ˆp s. For the table above, ˆp 1 = ˆp 2 = B/c the ˆp s do not change from the first half to the second half, there is no evidence that p has changed. Of course, this argument is not conclusive. We could make it more formal, but that is not my goal. I encourage you to think of this as an informal hypothesis test for which the null hypothesis is that p remains constant. Let s look at the trials again: Do you see memory? Well the last five trials yield exactly the same outcomes as the first five, so this looks like memory. But it could be simply the result of chance

15 But now you need to be a scientist. Does this 5 step memory make any sense to you? Discuss. Often, what makes sense is 1 step memory; i.e. memory in which a current trial is influenced by the outcome of the trial immediately before it. We take the 10 trials and form 9 overlapping pairs. (It is too difficult to do this w/my word processor. I will illustrate it on the board.) We create the following table. Note that in this table we are counting pairs not trials. Previous Current Trial Trial S F Total ˆp Total B/c the two ˆp s are equal, there is no evidence of 1 step memory. Discuss. I want to show you an easier and faster way to create the above memory table. First, we know that the number of pairs will be (n 1), which is nine for these data. Also, we know from previously comparing halves that there is a total of four successes and six failures. We put these numbers in the margins, yielding the partial table below. Previous Current Trial Trial S F Total ˆp Total A difficulty with this table, of course, is that = 10, not 9. But look at the first trial in the sequence, a success The first trial appears in only one pair b/c it has no trial before it. In other words, the first trial never gets to be current in the memory table. Thus, even though there are four successes in the sequence, only three of them are current. Thus, we change the column total for S to 3. Similarly, the last trial, also an S, never gets to be previous. Thus, we subtract one from the row total for S to get the following partial table, which has the correct margins. Previous Current Trial Trial S F Total ˆp Total Now, we simply determine by counting one of the entries in the table, and obtain the others by subtracting. For example, the pair SS (11) occurs exactly once in the sequence Here is another example with n = 20: These data yield the following tables. Discuss. Half S F Total ˆp Total Previous Current Trial Trial S F Total ˆp Total Over the years, I have had trouble convincing my students that BT exist in the world. (Math majors are extremely willing to believe assumptions; nonmath majors are not.) I decided to use the video game Tetris to convince my students that BTs do exist. Sadly, as we will see, I failed miserably

16 I collected these data circa 1990 using a Nintendo system played on my television. A trial is a shape of blocks falling from the top of the screen. First difficultly: There are 7, not 2 shapes Play Solution: Define log to be an S; all others are F s. I played 8 games and observed n = 1872 trials. Instead of dividing the trials into halves, I created the following table: Play S F Total ˆp First Second Third Fourth Fifth Sixth Seventh Eighth Total Looking at this table and its plot, I was feeling pretty good. There is some evidence that p is not constant, but the evidence seems weak to me. (A formal HT of the null hypothesis that p is constant versus the alternative that it can change every game see Chapter 11 for details gives a very large P-value, ) But then I created the memory table: Current Shape Previous Shape S F Total ˆp Success Failure Total Note: The grand total for this table is (n 8) b/c we lose one pair for each play. (I believe that it does not make sense to look for memory by pairing the last trial of one game with the first trial of the next game.) It is obvious (?) that there is memory. (The P-value is ) One of my favorite projects: Describe the petturtle study. Section 5.4: Some Practical Considerations The big result of Chapter 5 is the multiplication rule, from which we obtain the Binomial and many other results not discussed in this course. For a finite population, we get the MR if we sample at random w/replacement and for an infinite population if we have BTs B/c humans, and I would say especially mathematicians, like answers, there is a tremendous pressure to assume/pretend/deceive that we have the MR. In my experience, it is very common for researchers to claim that they have a random sample from a finite population, even when they clearly do not. Examples: Pretty much every survey you ever read about. My Wisconsin DOT study. (Discuss.) Birthdays. Below is a very famous table in probability theory

17 n Prob. n Prob. n Prob n Prob. n Prob Probabilities must be computed before the CM is operated. But despite my feeling this way, many people try to calculate probabilities after the CM has been operated. In my experience, the huge majority of such computations lead to gibberish. There are three major mistakes that people make. I call them: ignoring failed attempts; focusing on a winner; and inappropriate use of ELC. Multiple lotto winner. Said to beat-the-odds of one in 16 trillion. Discuss Discuss batting order example. Yahtzee. Assuming that the five dice are balanced (ELC) and act independently (no memory from die to die) then the probability of getting a Yahtzee on a single throw of the five dice is: It was reported on TV that the probability of this happening was 1 in Let s put that number in perspective. There are currently 30 teams in MLB, playing 162 games per year for a total of 2430 games per year. The above event is, thus, expected to happen once every 54,190,080 years. I really doubt that MLB will survive that long! (1/6) 4 = 1/1296. Thus, if you pick up five dice and throw a Yahtzee, that is pretty great. But suppose you toss the dice 10 times per minute for two hours and get a Yahtzee; what then? Suppose that 12,960 people each throw five dice at once. Do you think there will be any Yahtzees? How many?

Section 3.1: The probability histogram. Below is the probability histogram for the IS (p. 80 of text):

Chapter 3 Section 3.1: The probability histogram Below is the probability histogram for the IS (p. 80 of text): 1.5 1.5755 1.0 0.7500 0.5 0.90 0.30 0.30 0.90 How is it determined? + 81 On a horizontal