Blue Not Blue

Size: px

Start display at page:

Download "Blue Not Blue"

Silvester Cody Boone
6 years ago
Views:

Name: SOLUTIONS Final Exam (take home, open everything) One mouse study showed Brilliant Blue G, the food coloring used in blue M&Ms, accelerated healing of spinal injuries.

1 Name: SOLUTIONS Final Exam (take home, open everything) One mouse study showed Brilliant Blue G, the food coloring used in blue M&Ms, accelerated healing of spinal injuries. Is the proportion of blue M&Ms different for different flavors? Below is the class s count data that made it into REDCap. Milk Chocolate Peanut Pretzel image used without permission from 9GcSScmvx7CpJ5xCwSBIyHM480jiQvTkkW 7rLaTbE70swH 0E8uFMbw Blue Not Blue *Q1 2pts) In mathematical terms, using Greek letters to represent parameters, what is the null hypothesis for a Chi Squared test of this data? Ho: θmilk = θpeanut = θpretzel, where θ represents the probability of being blue. *Q2 2pts) At a 5% significance level, what would be the rejection region for this test? 5.99 (note the test has 2 degrees of freedom) *Q3 5pts) Under the null hypothesis, what is the expected number of Blue Milk Chocolate M&Ms, i.e. when calculating the Chi square test statistic, what would you use for the expected count E for the Blue Milk Chocolate M&Ms? Nmilk = 238. θpooled = 119/462. Emilk = 238*119/462 =

2 *Q4 4pts) Calculate the Chi square test statistic. In R: > chisq.test(matrix(c(66,172,18,77,35,94), nrow=2))$statistic X squared *Q5 2pts Take Home Only) Calculate the p value. In R: > chisq.test(matrix(c(66,172,18,77,35,94), nrow=2))$p.value [1] *Q6 7pts) Set up the calculations for the 95% confidence interval for the relative risk of being blue for milk chocolate vs peanut M&Ms. Write the final solution as (LB, UB) to four decimals. > exp(log((66/238)/(18/95)) 1.96*sqrt(172/66/238+77/18/95)) [1] > exp(log((66/238)/(18/95))+1.96*sqrt(172/66/238+77/18/95)) [1] (0.9205, ) *Q7 7pts) Set up the calculations for the 95% confidence interval for the odds ratio of being blue for milk chocolate vs peanut M&Ms. Write the final solution as (LB, UB) to four decimals. > exp(log((66*77)/(18*172)) 1.96*sqrt(1/66+1/18+1/172+1/77)) [1] > exp(log((66*77)/(18*172))+1.96*sqrt(1/66+1/18+1/172+1/77)) [1] (0.9132, ) 2

3 An M&Ms factory uses trained laborers to detect and remove defective M&Ms, i.e. misshapen or miscolored candies. They are testing a new digital scanning device to see how it performs against the laborers. They create a set of 10,000 M&Ms that they know have 9900 acceptable and 100 defective M&Ms. They run this same set of M&Ms through the laborers regular routine and through the digital scanner. The data for correctly identifying the 100 true positives is below for each method. Laborers Correct Laborers Incorrect Digital Correct Digital Incorrect *Q8 2pts) In mathematical terms, using Greek letters to represent parameters, what is the null hypothesis for a test of this data? Ho: θlaborers = θdigital, where θ represents the sensitivity of the method, i.e. the probability of correctly identifying a truly defective M&M. **Q9 7pts) Calculate an appropriate two sided p value. Write your solution to 4 decimal places. > binom.test( x=9, n=10, p=0.5 ) # best solution Exact binomial test number of successes = 9, number of trials = 10, p value = > prop.test( x=9, n=10, p=0.5, correct = T ) # okay solution 1 sample proportions test with continuity correction X squared = 4.9, df = 1, p value = > mcnemar.test(matrix(c(89,9,1,1), nrow=2), correct=t ) # okay solution McNemar's Chi squared test with continuity correction McNemar's chi squared = 4.9, df = 1, p value = > prop.test( x=9, n=10, p=0.5, correct = F ) # fails to account for small sample size 1 sample proportions test without continuity correction X squared = 6.4, df = 1, p value = > mcnemar.test(matrix(c(89,9,1,1), nrow=2), correct=f ) # fails to account for small sample size McNemar's Chi squared test McNemar's chi squared = 6.4, df = 1, p value =

4 *Q10 3pts) I m at a winter holiday party and, without paying much attention, grab three M&Ms from a bowl. When I sit down on the couch, I notice all three M&Ms are green. I know regular M&Ms are 16% green, whereas the winter holiday M&Ms are evenly divided between just green and red, i.e. 50% green. What is the statistical likelihood the bowl was filled with winter holiday M&Ms? 0.50^3 = *Q11 3pts) What is the statistical likelihood that the bowl was filled with regular M&Ms? 0.16^3 = *Q12 2pts) What is the likelihood ratio test statistic comparing the hypothesis H1: bowl was filled with winter holiday M&Ms vs. H2: bowl was filled with regular M&Ms? 0.50^3/0.16^3 = **Q13 5pts) Interpret the likelihood ratio test statistic, i.e. what does the evidence say and how strong is it? There is moderately strong evidence in favor of the bowl being filled with holiday M&Ms over regular M&Ms. An approximate formula for the surface area of an ellipsoid is and h, w, and d are the radii of the height, width, and depth. Below is a selection of the estimated surface areas of M&Ms from the class s data. Milk Chocolate: 450.2, 334.9, 355.9, Pretzel: 530.9, 769.0, 452.4, where p = *Q14 3pts) What is the null hypothesis for a Wilcoxon Mann Whitney Rank Sum Test comparing the surface areas of milk chocolate and pretzel M&Ms? Ho: Fmilk(X) = Fpretzel(X), i.e. the types of M&Ms have the same distribution. *Q15 3pts) What is the Wilcoxon Mann Whitney Rank Sum Test statistic? Per the lectures, W = RankSumBigger = = 18. You could also standardize that. wilcox.test( c(530.9, 769.0, 452.4), c(450.2, 334.9, 355.9, 345.2) ) gives W = 12. wilcox.test(c(450.2, 334.9, 355.9, 345.2), c(530.9, 769.0, 452.4) ) gives W = 0. They are all based on mathematically equivalent expressions. **Q16 6pts) Provide a two sided p value to four decimals for the rank sum test. By hand: 2 * 1 / (7 choose 4) = 2/35 = By R: p value =

5 A summary of the class s entire data for the surface area of Milk Chocolate and Pretzel M&Ms follows. Milk Chocolate: mean = sd = N = 42. Pretzel: mean = sd = N = 25. *Q17 2pts) In mathematical terms, using Greek letters to represent parameters, what is the null hypothesis for an equal variance t test comparing the surface areas of milk chocolate and pretzel M&Ms? Ho: µ milk = µ pretzel, where µ represents the true mean surface area. *Q18 2pts) What is the appropriate degrees of freedom for the test statistic? = 65 *Q19 2pts) What is the rejection region at a 5% significance level for a two sided alternative? Conservative estimate from Rice Table 4 uses 60 df = 2.000, or from R, qt(0.975, 65) = **Q20 7pts) What is the observed test statistic to two decimal places? > ( )/sqrt((41* ^2+24* ^2)*(1/42+1/25)/65)

6 While we used the Wilcoxon Mann Whitney Rank Sum test on the small selection of M&Ms and the equal variance t test on the full dataset, we could have used the Wilcoxon Mann Whitney Rank Sum test in both cases. Define the relative efficiency for a two sided 5% level test comparing the two statistical tests as the ratio of the tests power under certain conditions, i.e. RE = Power(Wilcoxon Mann Whitney Rank Sum test) / Power(equal variance t test). Calculate the relative efficiency under the following settings. ***Q21 4pts) A milk chocolate ~ N(μ=490, σ=10). N milk chocolate = 42. A pretzel ~ N(μ=500, σ=50). N pretzel = 25. # So one experiment looks like the following. Amilk = rnorm(n=42, mean=490, sd=10) Apret = rnorm(n=25, mean=500, sd=50) # The equal variance t test and wilcoxon test p values are: t.test( Amilk, Apret, var.equal=t)$p.value wilcox.test( Amilk, Apret)$p.value # So now we just need to put this all in a big loop and save the p values. The number of p values < 0.05 divided by the number of loops is the power. Nloops = 10^6 pvalst = rep( NA, Nloops ) pvalsw = rep( NA, Nloops ) for( loop in 1:Nloops ){ Amilk = rnorm(n=42, mean=490, sd=10) Apret = rnorm(n=25, mean=500, sd=50) pvalst[loop] = t.test( Amilk, Apret, var.equal=t)$p.value pvalsw[loop] = wilcox.test( Amilk, Apret)$p.value } powert = sum(pvalst < 0.05)/Nloops powerw = sum(pvalsw < 0.05)/Nloops RE = powerw / powert options(scipen=20) # don t use scientific notation c( powerw, powert, RE) So the power is pretty low for both (<30%), but the Wilcoxon test is only about 81% as efficient as the equal variance t test. However, this is hiding an insidious fact. Consider the following simulation. 6

7 pvalst = rep( NA, Nloops ) pvalsu = rep( NA, Nloops ) # unequal var t test pvalsw = rep( NA, Nloops ) for( loop in 1:Nloops ){ Amilk = rnorm(n=42, mean=500, sd=10) # identical means Apret = rnorm(n=25, mean=500, sd=50) pvalst[loop] = t.test( Amilk, Apret, var.equal=t)$p.value pvalsu[loop] = t.test( Amilk, Apret, var.equal=f)$p.value pvalsw[loop] = wilcox.test( Amilk, Apret)$p.value } powert = sum(pvalst < 0.05)/Nloops poweru = sum(pvalsu < 0.05)/Nloops powerw = sum(pvalsw < 0.05)/Nloops c( powerw, powert, poweru) # This is Type I error. Only the Wilcoxon test has the proper Type I error rate for this situation. ***Q22 4pts) A milk chocolate ~ t df=3 (μ=490, σ=10). N milk chocolate = 42. A pretzel ~ t df=3 (μ=500, σ=50). N pretzel = 25. By t df=3 (μ, σ), I mean a standard t df=3 distribution scaled and shifted to have mean μ and standard deviation σ. This is similar to the Q21, but has a challenge in figuring out how to simulate the data. Let s try the default df=3 in R. > x = rt(n=10^7,df=3) > mean(x) # That s pretty close to 0, which we know is right. > sd(x) # That s nowhere near 1. What s up? A quick search on Wikipedia tells us Var(t df=3) = 3. So the sd=sqrt(3) = ~ # So one experiment looks like the following. Amilk = rt(n=42, df=3)*10/sqrt(3) Apret = rt(n=25, df=3)*50/sqrt(3) # The equal variance t test and wilcoxon test p values are: t.test( Amilk, Apret, var.equal=t)$p.value wilcox.test( Amilk, Apret)$p.value # So now we just need to put this all in a big loop and save the p values. The number of p values < 0.05 divided by the number of loops is the power. Nloops = 10^5 pvalst = rep( NA, Nloops ) 7

8 pvalsw = rep( NA, Nloops ) for( loop in 1:Nloops ){ Amilk = rt(n=42, df=3)*10/sqrt(3) Apret = rt(n=25, df=3)*50/sqrt(3) pvalst[loop] = t.test( Amilk, Apret, var.equal=t)$p.value pvalsw[loop] = wilcox.test( Amilk, Apret)$p.value } powert = sum(pvalst < 0.05)/Nloops powerw = sum(pvalsw < 0.05)/Nloops RE = powerw / powert options(scipen=20) # don t use scientific notation c( powerw, powert, RE) So fairly low power for both tests (<40%), but the Wilcoxon test was ~10% more powerful. Note there is a lot more that I could do with this, including creating a CI (this is essentially a relative risk if we ignore the paired nature of the data) and performing a McNemar's test to see if the power's are statistically different (utilizing the paired nature of the data). 8

9 ***Q23 4pts) A milk chocolate ~ exponential(μ=490, σ=10). N milk chocolate = 42. A pretzel ~ exponential(μ=500, σ=50). N pretzel = 25. By exponential(μ, σ), I mean a standard exponential distribution scaled and shifted to have mean μ and standard deviation σ. Cole s solutions for all three: set.seed(68) sims< 10^6 res< data.frame(matrix(0, ncol=3, nrow=sims)) wes< data.frame(matrix(0, ncol=3, nrow=sims)) for(i in seq(sims)) { # generate normal distribution a1=rnorm(42,490,10) b1=rnorm(25,500,50) if(t.test(a1,b1,var.equal=true)$p.value <= 0.05) res[i,1] < 1 if(wilcox.test(a1, b1)$p.value <= 0.05) wes[i,1] < 1 # generate t distribution a2=rt(42, 3)*10/sqrt(3)+490 b2=rt(25, 3)*50/sqrt(3)+500 if(t.test(a2,b2,var.equal=true)$p.value <= 0.05) res[i,2] < 1 if(wilcox.test(a2, b2)$p.value <= 0.05) wes[i,2] < 1 # generate exponential distribution a3=rexp(42, 1/10)+480 b3=rexp(25, 1/50)+450 if(t.test(a3,b3,var.equal=true)$p.value <= 0.05) res[i,3] < 1 if(wilcox.test(a3, b3)$p.value <= 0.05) wes[i,3] < 1 } # colmeans(wes) contains the estimated power of each Wilcoxon # colmeans(res) contains the estimated power of each t test round( colmeans(wes)/colmeans(res), 3) # relative efficiency # Note, I stopped the simulation early at i =

10 ***Q24 12pts) Previously we had looked at the volume of M&Ms assuming they were spherical or oblate spheroids. Based on the class s data that made it into REDCap, the Peanut M&Ms were more irregularly shaped than we had assumed. Here are the means and standard deviations for the height, width, and depth radii (diameters/2). h = height radius. mean(h) = mm. sd(h) = mm. w = height radius. mean(w) = mm. sd(w) = mm. d = height radius. mean(d) = mm. sd(d) = mm. N = 32 peanut M&Ms measured. The volume of an ellipsoid is:. Using the delta method, derive a formula and calculate a 95% confidence interval for the volume of a Peanut M&M. Express your solution as (LB, UB) to two decimal places. Var[ log(vol) ] = Var[ log( 4pi/3 * hwd ) ] = Var[ log( 4pi/3 ) + log( h ) + log( w ) + log( d ) ] = Var[ log( h ) ] + Var[ log( w ) ] + Var[ log( d ) ] by independence and Var[constant]=0 = Var[ h ] * (1/h 2 ) + Var[ w ] * (1/w 2 ) + Var[ d ] * (1/d 2 ) sd h 2 /h 2 + sd w 2 /w 2 + sd d 2 /d 2 Assuming normality for the sample distribution of vol, a 95% CI will be exp{ log( 4pi/3 * hwd ) * sqrt( sd h 2 /h 2 + sd w 2 /w 2 + sd d 2 /d 2 ) } = exp( log( 4*pi/3 * * * ) * sqrt( ^2/32/ ^ ^2/32/ ^ ^2/32/ ^2 ) ) = ( , ). For comparison, let s create a CI based on the individual estimated surface areas for each M&M. a = c( ,1979.2, , , , , , , , , , , , ,1810.6,994.84,923.63,1810.6, ,293.22, ,1504.3, ,376.99, ,293.22, , , ,282.74, , ) mean(a) 1.96*sd(a)/sqrt(length(a)) mean(a)+1.96*sd(a)/sqrt(length(a)) ( , ) 10

Name: SOLUTIONS Final Part 1 (In class, solo work, open book and notes)

Name: SOLUTIONS Final Part 1 (In class, solo work, open book and notes) Throughout the exam, show your work and, unless specified otherwise, round all your final answers to 3 decimal places, e.g. 1.0015