ph: 5.2, 5.6, 5.8, 6.4, 6.5, 6.8, 6.9, 7.2, 7.5 sample mean = sample sd = sample size, n = 9

Size: px

Start display at page:

Download "ph: 5.2, 5.6, 5.8, 6.4, 6.5, 6.8, 6.9, 7.2, 7.5 sample mean = sample sd = sample size, n = 9"

Kathleen Atkins
5 years ago
Views:

Name: SOLUTIONS Final Part 1 (100 pts) and Final Part 2 (120 pts) For all of the questions below, please show enough work that it is completely clear how your final solution was derived.

1 Name: SOLUTIONS Final Part 1 (100 pts) and Final Part 2 (120 pts) For all of the questions below, please show enough work that it is completely clear how your final solution was derived. Sit at least one seat apart and put away laptops and cell phones. The quiz is open book, open notes, open cheat sheets, and open calculator. *1 4 pts) Sally decides to see if she can make her cat s saliva non neutral, i.e. acidic, ph < 7, or basic, ph > 7, by putting her beloved pet on a snow cone diet for a week (see figure to left). At the end of the week she measures the ph of the cat s saliva 10 times. After removing an outlier, she analyzes the following 9 measures. ph: 5.2, 5.6, 5.8, 6.4, 6.5, 6.8, 6.9, 7.2, 7.5 sample mean = sample sd = sample size, n = 9 image used without permission from 13a39e2aaaff36f190d5258ae960 Write the null and alternative hypothesis for a two sided test of this question. Remember to use Greek letters for parameters. Ho: µ ph = 7. Ha: µ ph 7. *2 8 pts) Referring to question 1, calculate an appropriate test statistic to three decimal places. Remember, don t round your numbers during calculations. Round at the very end to get three decimals. ( ) / ( / sqrt(9) ) = / = = In R: > x < c( 5.2, 5.6, 5.8, 6.4, 6.5, 6.8, 6.9, 7.2, 7.5 ) > t.test( x, mu=7 )$statistic t

2 *3 4 pts) Referring to questions 1 2, what is the appropriate critical value for a 5% significance level two sided test? Provide critical value to 3 decimal places. t 8, = *4 4 pts) Referring to question 1 3, what is an appropriate p value for this test? t 8, 0.95 = 1.860, thus the p value < To be explicit, 0.05 < p < In R: > t.test( x, mu=7 )$p.value [1] *5 8 pts) Referring to questions 1 4, calculate a 95% confidence interval for the mean ph of Sally s cat s saliva after exposure to the snow cone diet. Write your answer to three decimals * / sqrt(9) = * = ( , ) = ( 5.844, ) Wrong answer: * = ( 5.933, ) 2

3 *6 3 pts) Referring to question 1, the outlier Sally dropped was a 2.1. Here is the data with the outlier included and corresponding R output. (The question is at the bottom of the page.) > ph < c(2.1, 5.2, 5.6, 5.8, 6.4, 6.5, 6.8, 6.9, 7.2, 7.5) > mean(ph) [1] 6 > sd(ph) [1] > t.test(ph, mu=7) One Sample t test data: ph t = , df = 9, p value = alternative hypothesis: true mean is not equal to 7 95 percent confidence interval: sample estimates: mean of x 6 > wilcox.test(ph, mu=7) Wilcoxon signed rank test with continuity correction data: ph V = 7, p value = alternative hypothesis: true location is not equal to 7 If analyzing all 10 points of data, which test should Sally use and why? Sally should use the Wilcoxon signed rank test because the t test is sensitive to outliers, especially with a small sample size. The outlier makes the assumption of normality questionable, thus a non parametric test is more appropriate. 3

4 *7 4 pts) In a one sample case, the Wilcoxon signed rank test is for testing the null hypothesis, Ho: F(X) is symmetric about µ o, which is often simplified to the easier to understand though technically inaccurate Ho: median(x) = µ o, where µ o depends on the context of the problem. In the context of question 6, what is an appropriate null and alternative hypothesis for the problem? Best answer: Ho: F(X ph ) is symmetric about 7. Ha: F(X ph ) is not symmetric about 7. Acceptable though technically inaccurate answer: Ho: median(x ph ) = 7 Ha: median(x ph ) 7 *8 3 pts) Referring to questions 6 7, write a brief conclusion for the analysis of all 10 data points using a 5% significance level. Include an appropriate p value in your conclusion. We reject the null hypothesis that the distribution of ph measures is symmetric about 7, p value = The data suggests that the cat s saliva is slightly acidic. 4

Type -- Color Blue Red Orange Yellow Green Brown Total Milk chocolate 86 51 76 65 57 70 405 Dark 57 50 95 48 99 68 417 *9 8 pts) During class we collected the following M&Ms data.

5 Type -- Color Blue Red Orange Yellow Green Brown Total Milk chocolate Dark *9 8 pts) During class we collected the following M&Ms data. Create a 95% confidence interval for the risk difference (RD) for the risk of blue for Milk Chocolate M&Ms minus the risk of blue for Dark M&Ms. Write your answer to 3 decimal places, remembering to only round your numbers after you re done with all your calculations. We haven t learned the Wilson interval for the risk difference, so we ll use a Wald interval. image used without permission from com/blog/uploaded_images/unsi gneduser_662.unsignedchar_ png ( p milk p dark ) * sqrt( p milk * (1 p milk ) / n milk + p dark * (1 p dark ) / n dark ) ( 86/405 57/417 ) * sqrt( 86/405 * 319/405 / /417 * 360/417 / 417 ) * sqrt( ) * ( , ) ( 0.024, ) 5

6 *10 10 pts) Referring to the data in question 9, create a 95% confidence interval for the odds ratio (OR) for the odds of blue for milk chocolate M&Ms versus the odds of blue for dark M&Ms. Write your answer to 3 decimal places, remembering to only round your numbers after you re done with all your calculations. Rewriting the data in our a, b, c, d table format we have Blue Not Blue Milk a = 86 b = 319 Dark c = 57 d = 360 Odds ratio = ad/bc = (86*360)/(57*319) = To double check that we ve set up the table correctly, we can calculate the odds first and then calculate the ratio. Odds blue for milk = (86/405) / (319/405) = Odds blue for dark = (57/417) / (360/417) = Odds ratio = / = % CI assuming the n s are big enough such that the log(or) is roughly normally distributed. exp( ln(ad/bc) * sqrt(1/a + 1/b + 1/c + 1/d) ) = exp( ln( (86*360)/(57*319) ) * sqrt(1/86 + 1/ /57 + 1/360) ) = exp( ln( ) * sqrt( ) ) = exp( * ) = exp( ) = exp( , ) = ( , ) = ( 1.179, 2.458). 6

image used without permission from http://www.theiphoneblog.com/images/ stories/2009/03/app_store_church_lady.

7 image used without permission from stories/2009/03/app_store_church_lady.jpg *11 4 pts) In order to ensure that patient examinations are of high quality, a hospital employs a team of reviewers to do routine chart reviews on samples of exams. This team of reviewers rate the examinations as passing or failing. Two new reviewers, Pat and Terry, are being testing on their rating accuracy. They both rate 300 exams for which a gold standard rating has been done. Thus, it is known if their ratings are correct for these 300 exams. The results follow. Pat Correct Pat Incorrect Terry Correct Terry Incorrect Given that Pat and Terry are both rating the same exams, what is an appropriate statistical test with which to test whether Pat or Terry is the better reviewer? Provide the test s name. McNemar s Test for paired categorical data. **12 10 pts) Calculate a p value for a two sided test of the question in problem 11. Write your answer to 4 decimal places, remembering to only round your numbers after you re done with all your calculations. n D, the number of dichordant pairs = 80, which is > 20. Therefore, we can use the normal theory test. Similar rules of thumb, n*po*qo = 80*0.5*0.5 = 20 > 5. Also, n*phat*qhat = 80*48/80*32/80 = 19.2 > 5. So by all rules of thumb, the normal approximation is fine. With continuity correction: ( n A n D /2 ½ ) 2 / (n D /4) = ( 48 80/2 ½ ) 2 / (80/4) = 7.5^2 / 20 = ( n A n B 1 ) 2 / (n A + n B ) = ( ) 2 / 80 = 15^2 / 80 = χ 2 1,.90 = χ 2 1,.95 = Therefore, the p value < Explicitly, 0.05 < p < Remember the χ 2 is for the two sided test, so you don t need to double the p values here. Because χ 2 1 is Z 2, we can be more precise if we wish. z = sqrt(2.8125) = = Using Rosner table 3, p value = 2* = Because we used the Z curve, we did need to double the p value for the two sided test. Without continuity correction: ( n A n B ) 2 / (n A + n B ) = ( ) 2 / 80 = 16^2 / 80 = 3.2 Therefore, 0.05 < p < More precisely, z = sqrt(3.2) = = 1.79, gives us p value = 2* =

8 **13 10 pts) We saw in question 12 that reviewers sometimes give incorrect ratings. In an effort to improve the accuracy of the ratings, the hospital uses the following system. An examination is rated by two reviewers. If the two reviewers agree (both rate pass or both rate fail) then the agreed upon rating is given. If the two reviewers disagree (one rates pass and the other fail), then a third reviewer reviews the examination with their rating breaking the tie. Suppose the probability of a single reviewer correctly rating an exam is 70% for all reviewers. What is the probability that the hospital s method will yield the correct exam rating? Give your answer to three decimal places. Think through all of the scenarios carefully. P(correct rating given) = P(1 st correct & 2 nd correct) + P(1 st correct & 2 nd wrong & 3 rd correct) + P(1 st wrong & 2 nd correct & 3 rd correct) Assuming the reviewers ratings are independent, P(correct rating given) = P(1 st correct ) * P(2 nd correct) + P(1 st correct) * P(2 nd wrong) * P(3 rd correct) + P(1 st wrong) * P(2 nd correct) * P(3 rd correct) = 0.7 * * 0.3 * * 0.7 * 0.7 =

9 **14 10 pts) Sally s parents form a mini IRB and tell their daughter that any of her future experiments on her pets need to be sufficiently powered. Sally wants to redo her snow cone experiment from question 1. If she assumes that the ph of her cat s saliva after the diet will be normally distributed with a mean of 6.5 and a standard deviation of 1, how many ph measures will she need to take to have her study powered at 80%? In other words, what sample size will yield a power of 80%? Assume Sally will be doing a two sided test at a 5% significance level with the assumed standard deviation treated as known. The power for a one sample Z test for the mean of a population with a two sided alternative with a 5% significance level is approximately φ[ µo µ1 / ( σ/sqrt(n) ) ]. We want to choose n such that φ[ µo µ1 / ( σ/sqrt(n) ) ] = We solve as follows. φ[ µo µ1 / ( σ/sqrt(n) ) ] = µo µ1 / ( σ/sqrt(n) ) = 0.85, by choosing the appropriate value from the normal table / ( 1/sqrt(n) ) = 0.85, by inserting the assumed values. 0.5 / ( 1/sqrt(n) ) = sqrt(n) = ( ) / 0.5 sqrt(n) = 5.62 n = 31.6 Therefore, n = 32. 9

10 ***15 10 pts) Suppose you have two normally distributed samples, both of sample size n, as follows. X1, X2,, Xn ~iid Normal( µ=10, σ=1 ) Y1, Y2,, Yn ~iid Normal( µ=50, σ=1 ) Consider performing a two sided t test on X vs Y and a two sided Wilcoxon rank sum test on X vs Y. What is the smallest sample size n such that the t test p value and the Wilcoxon test p value will agree to four decimal places with near certainty? In other words, choose the smallest n such that P( p value t test p value wilcoxon < ) > Include proof that your n is the correct answer. First notice that the means of X and Y are 40 units apart, while their standard deviations are only 1. Thus, with probability all of the values of X and Y will be within 5 standard deviations of the true mean. All values of X will be within 5 to 15, and all of the values of Y will be within 45 to 55. Consequently, by the time we get to n = 4, the p value for the t test will be essentially 0. Consider as an upper bound for probable t test p values the p value from a random sample X = {5, 5, 15, 15} and Y = {45, 45, 55, 55}, p value = If n = 6, X = {5, 5, 5, 15, 15, 15} and Y = {45, 45, 45, 55, 55, 55} yields p value = Better yet, notice that the power = 1 for the t test when n = 3 and alpha = , in other words, the p value for the t test will be 0 to four decimals with probability 1 with n > 3. > power.t.test(n = 3, delta = 40, sd = 1, sig.level = , type = "two.sample", alternative = "two.sided")$power = 1. So if the t test p value goes to 0 very quickly, this question is essentially, what s the smallest n where the p value of the Wilcoxon test is 0 to four decimals. P( p value t test p value wilcoxon < ) > is equivalent to asking P( p value wilcoxon < ) > We ve already observed that samples from X and Y will essentially never overlap. So the Wilcoxon test will always take its smallest possible p value. Thus we solve for the smallest n where p value wilcoxon < In general, the two sided p value will equal 2 / choose(2n, n). n p value Thus, we ve shown 9 is the smallest n such that P( p value t test p value wilcoxon < ) >

11 ***16 5 pts) Using Calculus, show that the maximum likelihood estimator for the proportion of blue M&Ms from a sample of size n with x blue M&Ms observed is equal to x/n, i.e. show that x/n is the maximum likelihood estimator of θ. 1 The likelihood has a single maximum for θ in the range (0, 1). To solve for that maximum, we can take the first derivative of L(θ x) with respect to θ, set it equal to zero, and solve for θ ***17 5 pts) Referring to question 16, when using a beta prior distribution, B(α, β), show that the mean posterior estimator for the proportion of blue M&Ms is equal to x/n only when α = 1 and β = 1. From the Bayes workshop, we know that with a Beta(α, β) prior, we have a posterior f(θ x) = θ ^(α 1 + x) * (1 θ)^(β 1 + n x) / B(α 1 + x, β 1 + n x). This is also a beta distribution, B(α 1 + x, β 1 + n x). The mean of a beta distribution, B(a, b) is a/(a+b). Thus, the mean of the posterior is (α 1 + x) / (α 1 + x+ β 1 + n x) = (x + α 1) / (n + α+β 2). Therefore, the posterior mean will only equal x/n if α = 1 and β = 1. 11

12 *18 3 pts) Referring to question 9, create a 95% confidence interval for the relative risk (RR) for the risk of blue for Milk Chocolate M&Ms versus the risk of blue for Dark M&Ms. Write your answer to 3 decimal places. Rewriting the data in our a, b, c, d table format we have Blue Not Blue Milk a = 86 b = 319 Dark c = 57 d = 360 Relative Risk = ( a*(c+d) ) / ( (a+b) * c ) = ( 86 * (57+360) ) / ( (86+319) * 57 ) = I m tired, so I m going to make R do this. install.packages('epitools') library('epitools') MMs < matrix( c(86,57,319,360), 2, 2) dimnames(mms) < list( "Type" = c("milk", "Dark"), "Color" = c("blue", "NotBlue") ) MMs epitab(mms, method="riskratio", rev='b') ( 1.144, ), or ( 1.14, 2.11 ) *19 3 pts) Referring to question 9, create a 1/8 th support interval for the relative risk (RR) for the risk of blue for Milk Chocolate M&Ms versus the risk of blue for Dark M&Ms. Write your answer to 2 3 decimal places. rr.lik (86, 405, 57, 417) ( 1.14, 2.14 ) *20 3 pts) Referring to question 9, create a 95% credible interval for the relative risk (RR) for the risk of blue for Milk Chocolate M&Ms versus the risk of blue for Dark M&Ms using independent Beta(1, 1) prior distribution for the proportion of blue M&Ms in Milk Chocolate and Dark Chocolate. Write your answer to 2 3 decimal places. posterior1 < rbeta( 3*10^6, , ) posterior2 < rbeta( 3*10^6, , ) posteriorrr < posterior1 / posterior2 round( quantile( posteriorrr, probs=c(0.025, 0.975) ), 3 ) 2.5% 97.5% ( 1.150, ) Note, the last decimal place isn t stable, even with 3 million samples from each posterior. ( 1.15, 2.13 ) is more stable. *21 1 pts) Referring to questions 18 20, how do the intervals for the RR from the three statistical paradigms compare to each other? They are very similar. 12

Name: SOLUTIONS Final Part 1 (In class, solo work, open book and notes)

Name: SOLUTIONS Final Part 1 (In class, solo work, open book and notes) Throughout the exam, show your work and, unless specified otherwise, round all your final answers to 3 decimal places, e.g. 1.0015