Rules for Means and Variances; Prediction

Size: px

Start display at page:

Download "Rules for Means and Variances; Prediction"

Holly Daisy Heath
5 years ago
Views:

1 Chapter 7 Rules for Means and Variances; Prediction 7.1 Rules for Means and Variances The material in this section is very technical and algebraic. And dry. But it is useful for understanding many of the methods we will learn later in this course. We have random variables X 1, X 2,...X n. Throughout this section, we will assume that these random variables are independent. Sometimes they will also be identically distributed, but we don t need identically distributed for our main result. (There is a similar result without independence too, but we won t need it.) Let µ i denote the mean of X i. Let σ 2 i denote the variance of X i. Let b 1, b 2,..., b n denote n numbers. Define W = b 1 X 1 + b 2 X b n X n. W is a linear combination of the X i s. The main result is The mean of W is µ W = n i=1 b i µ i. The variance of W is σ 2 W = n i=1 b 2 iσ 2 i. Special Cases 1. i.i.d. case. If the sequence is i.i.d. then we can write µ = µ i and σ 2 = σi 2. In this case the mean of W is µ W = ( n i=1 b i )µ and the variance of W is σw 2 = ( n i=1 b 2 i )σ2. 2. Two independent rv s. If n = 2, then we usually call the random variables X and Y instead of X 1 and X 2. We get W = b 1 X + b 2 Y which has mean µ W = b 1 µ X + b 2 µ Y and variance σ 2 W = b 2 1σ 2 X + b 2 2σ 2 Y. 3. Two i.i.d. rv s. Combining the notation of the previous two items, W = b 1 X + b 2 Y has mean µ W = (b 1 + b 2 )µ and variance σ 2 W = (b2 1 + b2 2 )σ2. Especially important is the case W = X + Y which has mean µ W = 2µ and variance σ 2 W = 2σ2. Another important case is W = X Y which has mean µ W = 0 and variance σ 2 W = 2σ2. 79

2 7.2 Predicting for Bernoulli Trials Predictions are tough, especially about the future Yogi Berra. We plan to observe m BT and want to predict the total number of successes that we will get. Let Y denote the random variable and y the observed value of the total number of successes in the future m trials. Similar to estimation, we will learn about point and interval predictions When p is Known We begin with point prediction of Y ; we denote the point prediction by ŷ. We adopt the criterion that we want the probability of being correct to be as large as possible. Below is the result. Calculate the mean of Y, which is mp. If mp is an integer then it is the most probable value of Y and our prediction is ŷ = mp. Here are some examples. Suppose that m = 20 and p = Then, mp = 20(0.5) = 10 is an integer, so 10 is our point prediction of Y. With the help of our website calculator (details not given), we find that P(Y = 10) = Suppose that m = 200 and p = Then, mp = 200(0.5) = 100 is an integer, so 100 is our point prediction of Y. With the help of our website calculator, we find that P(Y = 100) = Suppose that m = 300 and p = Then, mp = 300(0.3) = 90 is an integer, so 90 is our point prediction of Y. With the help of our website calculator, we find that P(Y = 90) = If mp is not an integer, then it can be shown that the most probable value of Y is one of the integers immediately on either side of mp. Just check them both. Here are some examples. Suppose that m = 20 and p = Then, mp = 20(0.42) = 8.4 is not an integer. The most likely value of Y is either 8 or 9. With the help of our website calculator, we find that P(Y = 8) = and P(Y = 9) = Thus, ŷ = 8. Suppose that m = 100 and p = Then, mp = 100(0.615) = 61.5 is not an integer. The most likely value of Y is either 61 or 62. With the help of our website calculator, we find that P(Y = 61) = and P(Y = 62) = Thus, ŷ = 62. In each of the above examples we saw that the probability that a point prediction is correct is very small. As a result, scientists usually prefer a prediction interval. It is possible to create a one-sided prediction interval, but we will consider only two-sided prediction intervals. We have two choices: using a SNC approximation or finding an exact interval. Even if you choose the exact interval, it is useful to begin with the SNC approximation. 80

3 The SNC approximation says to use the interval: mp ± z mpq where z is the same number as we used for the two-sided confidence interval for p. Here is an example. Suppose that m = 100 and p = The SNC approximate 95% prediction interval is 61.5 ± (0.385) = 61.5 ± 9.54 = [52.96, 71.04]. Now, it makes no sense to predict a fractional number of successes, so we will round-off the endpoints to get [53, 71]. We can use our binomial calculator to see whether the SNC approximation is any good. Doing this, we find that the exact probability that Y will be in the interval [53, 71] is If this answer had been importantly too small or too large, we could modify either or both endpoints to get a more desirable answer. The point is that the SNC approximation will either give us a good answer (as in this case) or help us find a good answer When p is Unknown We now consider the situation in which p is unknown. We will begin with point prediction. The first problem is that we cannot achieve our criterion s goal: we cannot find the most probable value of Y. The most probable value, as we saw above, is at or near mp, but we don t know what p is. Thus, we adopt an ad-hoc approach. We simply decide to use mˆp as our point prediction, where ˆp is our point estimate of p. Of course, if mˆp is not an integer, we need to round-off to a near integer. Because this is ad-hoc, I say just round to the nearest integer. If two integers are equally close, round to the one that is an even number. But where did ˆp come from? Well, we need to add another ingredient to our procedure. We assume that we have past data from the CM that will generate the m future trials. We denote the past data as consisting of n trials which yielded x successes, giving ˆp = x/n as our point estimate of the unknown p. As I said above, our point prediction is mˆp = mx/n. This answer is ad-hoc; we use it because it seems sensible. This approach suggests taking our prediction interval for p known, mp±z mpq, and simply changing it to mˆp ± z mˆpˆq and saying, It s ad-hoc. We cannot do this! Why? Well, because we want an interval so that the probability is, for example, 95% that the interval will be correct; i.e. contain the eventual value of Y. It turns out, and you can show this with a simulation study if you want, that the interval mˆp ± z mˆpˆq makes too many mistakes; i.e. the probability it will be correct is smaller than the target associated with the choice of z. Instead of ad-hoc procedure that does not perform as desired, we directly derive an answer using the result of Section 1 of this chapter. We begin by writing W = Y mˆp = Y (m/n)x. We now apply our result of Section 1 with: X 1 = Y, X 2 = X, b 1 = 1 and b 2 = m/n. 81

4 We can calculate the mean and variance of W. Thus, we can standardize W : µ W = µ Y (m/n)µ X = mpq (m/n)npq = 0, and σ 2 W = σ 2 Y + (m/n) 2 σ 2 X = mpq + (m/n) 2 npq = mpq[1 + (m/n)]. Z = W µ W σ W = W 0 mpq 1 + (m/n) = Y mˆp mpq 1 + (m/n). It can be shown that if m and n are both large and p is not too close to either 0 or 1, then probabilities for Z can be well approximated by the SNC. Slutsky s work (see Chapter 3) can also be applied here; the result is: Z Y mˆp = mˆpˆq 1 + (m/n). It can be shown that if m and n are both large and p is not too close to either 0 or 1, then probabilities for Z can be well approximated by the SNC. As a result, using the same algebra we had in Chapter 3 (the names have changed) we can expand Z and get the following prediction interval for Y : mˆp ± z mˆpˆq 1 + (m/n). (7.1) I will illustrate the use of this formula with real data from basketball. Years ago Katie Voigt, a very talented basketball player, took an independent study course from me. In exchange for my help, she graciously provided me with the results of 2000 three-point shots that she attempted in basketball. She obtained her data by performing 100 shots every day for a total of 20 days. I will use some of Katie s data to illustrate the current method. Katie plans to attempt m = 100 three-point shots. I am going to assume that Katie s shots are BT, but I don t know the numerical value of p. I have previous data for Katie in which she attempted n = 200 shots and achieved a total of x = 115 successes. This gives us ˆp = 115/200 = The point prediction of Y is mˆp = 100(0.575) = 57.5 rounded to 58, an even number. Next, I will obtain the 95% prediction interval for Y : 57.5± (0.425) 1 + (100/200) = 57.5±1.96(4.94)(1.225) = 57.5±11.9 = [45.6, 69.4] which I round to [46, 69]. This is a very wide interval. (Why do I say this? Well, in my opinion, a basketball player will believe that making 46 out of 100 is seriously different than making 69 out of 100.) A great thing about prediction (not shared by estimation or testing) is that we find out whether our answer is correct. It is no longer the case that Only Nature knows! In particular, when Katie attempted the m = 100 future shots, she obtained y = 66 successes. Our point prediction, 58, was too small, but our prediction interval is correct because it includes y = 66. You should note that there is no exact solution to this problem. It is, indeed, an extremely complicated computation. Even a simulation study is a challenge. I will discuss how a simulation study is done and then show you how it works. 82

5 In order to do a simulation study we must specify three numbers: m, n and p. (Even though the researcher does not know p, Nature, the great simulator, must know it.) And, as we shall see, it is a two-stage simulation. To be specific, I will simulate something similar to Katie s problem. I will take m = 100, n = 200 and p = (Of course, I don t know Katie s p, but 0.60 seems to be not ridiculous.) A single run of the simulation study is as follows. 1. Simulate the value of X Bin(200,0.60), our simulated previous data for Katie. 2. Use our value of X from step 1 to estimate p and then compute the 95% prediction interval for Y, remembering to use the interval for p unknown. 3. Simulate the value of Y Bin(100,0.60) and see whether or not the interval from step 2 is correct. I performed this simulation with 10,000 runs and obtained 9493 (94.93%) correct prediction intervals! This is very close to the nominal (advertised) rate of 95% correct. 7.3 *Predicting for the Poisson (Optional) We plan to observe the value of the random variable Y Poisson(θ) When θ is Known We begin with point prediction of Y. We adopt the criterion that we want the probability of being correct to be as large as possible. Below is the result. If θ is an integer then y = θ and y = (θ 1) are equally likely to occur and either one is more likely than any other value of Y. For uniformity, we will use the even number of θ and (θ 1) as our point prediction. For example, if θ = 10, it can be shown by using our Poisson calculator that P(Y = 10) = P(Y = 9) = Our point prediction is ŷ = 10. If θ is not an integer then there is a unique most likely value of Y and it is ŷ = the greatest integer in θ. For example, if θ = 7.6, then the greatest integer in θ is 7. With the help of our calculator, P(Y = 7) = is larger than either P(Y = 6) = or P(Y = 8) = Indeed, y = 7 is the most likely of all values for Y ; thus, ŷ = 7. If θ is very close to an integer these results converge as seen in the following example. If θ = then P(Y = 7) = is only slightly larger than P(Y = 8) = As the above examples indicate, the probability that a point prediction will be correct is quite small. Thus, we want a prediction interval (PI) too. We follow the approach taken earlier for BT: We calculate the PI based on using the SNC approximation and then use our calculator to see whether our answer needs improvement. We begin with the facts that the mean and variance of Y are both equal to θ. Approximating the standardized Poisson by the SNC, we get the following PI: θ ± z θ (7.2) 83

6 Below are a couple of examples. 1. Suppose that θ = 200 and we want 95% probability of being correct. First, 95% gives z = 1.96 and our PI is: 200 ± = 200 ± After rounding, our PI is [172, 228]. With the help of our calculator, we find that for θ = 200, P(172 Y 228) = We note that the actual probability, , is somewhat larger than our target value, Most days I would decide, I don t care about the difference. But it s possible that I will think, Perhaps I can make the interval narrower and still achieve 95%. (Why would I prefer a narrower interval?) With this in mind, I decide to shorten either end of the answer I have, [172, 228]. I get: P(173 Y 228) = and P(172 Y 227) = With either shorter interval, I still achieve the target of 95%; I would use [173, 228] because it gives a very slightly higher chance of being correct versus but there really is not much basis for choosing between these. Note that if we shorten this last interval further, either to [174, 228] or [173, 227], then (details not shown) the probability of being correct drops below Suppose that θ = 5 and we want 95% probability of being correct. As above, 95% gives z = 1.96 and our PI is: 5 ± = 5 ± 4.4. After rounding, our PI is [1, 9]. With the help of our calculator, we find that for θ = 5, Moreover, Thus, [1, 9] is our answer When θ is Unknown P(1 Y 9) = P(1 Y 8) = and P(2 Y 9) = We have previous data on X Poisson(θ 1 ) with observed value x. We assume a known relationship between the parameters: θ = cθ 1, for some known number c. This is, perhaps, easiest to visualize for a Poisson Process. For example, suppose that we have a Poisson Process with unknown rate λ per hour. We define X to be the number of successes in a (past) observation of the Poisson Process for a known length of time t 1 hours. We define Y to 84

7 be the number of successes in a (future) observation of the Poisson Process for a known length of time t 2 hours. With this notation, θ 1 = t 1 λ, θ = t 2 λ and θ = (t 2 /t 1 )θ 1. We begin with our ad-hoc point prediction. The idea is that we estimate θ 1 by x, so our estimate of θ is cx. If cx is an integer, then it is my point prediction. If cx is not an integer, I round it to the nearest integer and that is my point prediction. For the PI, we proceed as follows. Define W = Y cx. Using the results of the first section, the mean of W is µ W = µ Y cµ X = θ cθ 1 = θ θ = 0, and the variance of W is σ 2 W = σ2 Y + c2 σ 2 X = θ + c2 θ 1 = θ + c 2 (θ/c) = θ(1 + c). Thus, the standardized version of W is Z = Y cx θ(1 + c). It can be shown that if θ and cθ are both large, then probabilities for Z can be well approximated by using the SNC. Slutsky s work also applies here. In particular, let Z = Y cx cx(1 + c). It can be shown that if θ and cθ are both large, then probabilities for Z can be well approximated by using the SNC. We can invert the formula for Z to obtain the following PI for Y, when θ is unknown. cx ± z cx(1 + c) (7.3) Below are a couple of examples of how to use this formula. 1. Paula wants to predict the number of successes observed during 100 hours of a Poisson Process with unknown rate. From past data, 40 hours yielded 120 successes. Find the point prediction and 90% PI for Paula. Solution: The relationship between the parameter for the future, θ, and the parameter for the past data, θ 1, is θ = (100/40)θ 1 = 2.5θ 1. Thus, c = 2.5. For the point estimate, we calculate cx = 2.5(120) = 300. Thus, the adhoc point prediction for Y is ŷ = 300. The goal of 90% probability of being correct gives z = Thus, the PI is cx ± z cx(1 + c) = 300 ± ( ) = 300 ± 53.3 = [247, 353], after rounding. 85

8 2. This example is similar to the previous one; its purpose is to examine the effect of having much more data for estimating the unknown θ. Mary wants to predict the number of successes observed during 100 hours of a Poisson Process with unknown rate. From past data, 400 hours yielded 1200 successes. Find the point prediction and 90% PI for Mary. Solution: The relationship between the parameter for the future, θ, and the parameter for the past data, θ 1, is θ = (100/400)θ 1 = 0.25θ 1. Thus, c = For the point estimate, we calculate cx = 0.25(1200) = 300. Thus, the ad-hoc point prediction for Y is ŷ = 300. The goal of 90% probability of being correct gives z = Thus, the PI is cx ± z cx(1 + c) = 300 ± ( ) = 300 ± 31.9 = [268, 332], after rounding. As with BT with p unknown, there is no exact solution to this problem. It is, indeed, an extremely complicated computation. Even a simulation study is a challenge. I will discuss how a simulation study is done and then show you how it works. In order to do a simulation study we must specify two numbers: c and θ 1. And, as we shall see, it is a two-stage simulation. To be specific, I will simulate something similar to the first of our above examples. I will take c = 2.5 and θ 1 = Simulate the value of X Poisson(120). 2. Use our value of X from step 1 to compute the 90% prediction interval for Y ; remember to use the formula for θ unknown. 3. Simulate the value of Y Poisson(300) and see whether or not the interval from step 2 is correct. I performed this simulation with 10,000 runs and obtained 9003 (90.03%) correct prediction intervals! This is almost a perfect match to the target value of 90% correct. 86

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p Chapter 3 Estimation of p 3.1 Point and Interval Estimates of p Suppose that we have Bernoulli Trials (BT). So far, in every example I have told you the (numerical) value of p. In science, usually the