Psych 10 / Stats 60, Practice Problem Set 5 (Week 5 Material) Part 1: Power (and building blocks of power)

Size: px

Start display at page:

Download "Psych 10 / Stats 60, Practice Problem Set 5 (Week 5 Material) Part 1: Power (and building blocks of power)"

Derek McCormick
5 years ago
Views:

1 Psych 10 / Stats 60, Practice Problem Set 5 (Week 5 Material) Part 1: Power (and building blocks of power) 1. A researcher plans to do a two-tailed hypothesis test with a sample of n = 100 people and a significance level of α =.05. For each change, describe how it would affect (a) the probability of Type I Error, (b) the probability of Type II error, and (c) the power of the test. a. changing to a significance level of α =.01 decrease probability of Type I Error, increase probability of Type II Error, decrease power of the test b. changing to a one-tailed test with a significance level of α =.05 no effect on probability of Type I Error, and assuming the researcher had been correct about the direction of the effect it would decrease the probability of Type II Error and increase power c. increasing the sample size to n = 500 no effect on probability of Type I Error, decrease probability of Type II Error, increase power d. (outside of the experimenter s control), the difference between the true population mean and the null hypothesis increased no effect on probability of Type I Error, decrease probability of Type II Error, increase power e. (outside of the experimenter s control), the standard deviation of the population increased no effect on probability of Type I Error, increase probability of Type II Error, decrease power note: more generally, the only thing that affects Type I Error is alpha and that anything that increases the probability of Type II Error will decrease power (since these probabilities must sum to 1) 2. For each scenario, find the sample mean that corresponds to the z-statistic. (You also might want to think about whether this procedure would change if you were instead finding a sample mean corresponding to a t-statistic). The procedure would not change for a sample mean corresponding to a t-statistic. a. μ 0 = 100, σ = 20, n = 17, z = 1.24 x = μ 0 + z * σ x = μ 0 + z * (σ / n) = * (20 / 17) = b. μ 0 = 100, σ = 20, n = 17, z = x = μ 0 + z * σ x = μ 0 + z * (σ / n) = (-.83) * (20 / 17) = c. μ 0 = 100, σ = 10, n = 17, z = 1.24 x = μ 0 + z * σ x = μ 0 + z * (σ / n) = * (10 / 17) = d. μ 0 = 100, σ = 20, n = 100, z = 1.24

2 x = μ 0 + z * σ x = μ 0 + z * (σ / n) = * (20 / 100) = e. μ 0 = 120, σ = 20, n = 17, z = 1.24 x = μ 0 + z * σ x = μ 0 + z * (σ / n) = * (20 / 17) = Take a moment to compare your answers for a-e, and notice how each change from the scenario in a changes the answer and why. f. μ 0 = -20, σ = 3, n = 6, z = x = μ 0 + z * σ x = μ 0 + z * (σ / n) = (-2.30) * (3 / 6) = g. μ 0 = -86, σ = 1000, n = 25, z =.64 x = μ 0 + z * σ x = μ 0 + z * (σ / n) = * (1000 / 25) = 42 h. μ 0 = 337, σ = 7, n = 1000, z = 2.00 x = μ 0 + z * σ x = μ 0 + z * (σ / n) = * (7 / 1000) = i. μ 0 = 0, σ = 10, n = 200, z = x = μ 0 + z * σ x = μ 0 + z * (σ / n) = 0 + (-1.00) * (10 / 200) = For each set of values, calculate the probability that we would observe sample data that would allow us to reject the null hypothesis that µ = μ 0, if in fact the true population mean is µ = μ guess. note: making a sketch for these problems is highly recommended, and is the best way to keep track of whether you are looking for probability in the upper vs. lower tail also note, there was no value for α or instructions for a one vs. two tailed test in the version of the problem set that was originally posted when in doubt, for this class, use α =.05 and a two-tailed test a. μ 0 = 100, σ = 20, n = 25, μ guess = 110 i. first, figure out x critical (which is in reference to the distribution under μ 0 ). we ll reject H 0 if z < or z > since μ guess is > μ 0, we re interested the sample mean, x critical, that corresponds to z critical = x critical = μ 0 + z critical * σ x = μ 0 + z critical * σ / n = * 20 / 25 = ii. second, figure out the z-value that corresponds to x critical in the distribution under μ guess, and calculate a probability of exceeding that z-value. in this case we are interested in finding a sample mean that gives us a z-statistic greater than this z-value, since μ guess is > μ 0 z = (x critical μ guess ) / σ x = (x critical μ guess ) / (σ / n) = ( ) / (20 / 25) = -.54 p(z > -.54) = pnorm(-.54, lower.tail=false) =.71 this tells us that if the true population mean is actually 110 (and the true population standard deviation is actually 20), and we sample n = 25 people, we will have a 71% chance of correctly rejecting H 0 that μ = 100 b. μ 0 = 50, σ = 1, n = 25, μ guess = 49 i. first, figure out x critical (which is in reference to the distribution under μ 0 ).

3 we ll reject H 0 if z < or z > since μ guess is < μ 0, we re interested the sample mean, x critical, that corresponds to z critical = x critical = μ 0 + z critical * σ x = μ 0 + z critical * σ / n = * 1 / 25 = ii. second, figure out the z-value that corresponds to x critical in the distribution under μ guess, and calculate a probability of exceeding that z-value. in this case we are interested in finding a sample mean that gives us a z-statistic less than this z-value, since μ guess is < μ 0 z = (x critical μ guess ) / σ x = (x critical μ guess ) / (σ / n) = ( ) / (1 / 25) = 3.05 p(z < 3.05) = pnorm(3.05, lower.tail=true) =.999 this tells us that if the true population mean is actually 49 (and the true population standard deviation is actually 1), and we sample n = 25 people, we will have a 99.9% chance of correctly rejecting H 0 that μ = 50 c. μ 0 = -25, σ = 10, n = 36, μ guess = -27 i. first, figure out x critical (which is in reference to the distribution under μ 0 ). we ll reject H 0 if z < or z > since μ guess is < μ 0, we re interested the sample mean, x critical, that corresponds to z critical = x critical = μ 0 + z critical * σ x = μ 0 + z critical * σ / n = * 10 / 36 = ii. second, figure out the z-value that corresponds to x critical in the distribution under μ guess, and calculate a probability of exceeding that z-value. in this case we are interested in finding a sample mean that gives us a z-statistic less than this z-value, since μ guess is < μ 0 z = (x critical μ guess ) / σ x = (x critical μ guess ) / (σ / n) = ( (-27)) / (10 / 36) = -.76 p(z < -.76) = pnorm(-.76, lower.tail=true) =.22 this tells us that if the true population mean is actually -27 (and the true population standard deviation is actually 10), and we sample n = 36 people, we will have a 22% chance of correctly rejecting H 0 that μ = -25 d. μ 0 = 0, σ = 1000, n = 16, μ guess = 200 i. first, figure out x critical (which is in reference to the distribution under μ 0 ). we ll reject H 0 if z < or z > since μ guess is > μ 0, we re interested the sample mean, x critical, that corresponds to z critical = x critical = μ 0 + z critical * σ x = μ 0 + z critical * σ / n = * 1000 / 16 = 490 ii. second, figure out the z-value that corresponds to x critical in the distribution under μ guess, and calculate a probability of exceeding that z-value. in this case we are interested in finding a sample mean that gives us a z-statistic greater than this z-value, since μ guess is > μ 0 z = (x critical μ guess ) / σ x = (x critical μ guess ) / (σ / n) = ( ) / (1000 / 16) = 1.16 p(z > 1.16) = pnorm(1.16, lower.tail=false) =.12

4 this tells us that if the true population mean is actually 200 (and the true population standard deviation is actually 1000), and we sample n = 36 people, we will have a 12% chance of correctly rejecting H 0 that μ = 0 You might notice that the power in the last two scenarios is very poor! Even if the effect exists, we most likely wouldn t be able to detect it with the current study design, so we might wonder, why even do the study at all? In these scenarios the sample size is the only thing that is under the control of the researcher / statistician, so they might decide to increase their sample size or to not invest their time and energy in pursuing this particular question, since the odds are low that they will make the correct decision in their test (if their assumptions about the true population mean and standard deviation are correct). 4. A team captain reads that if you start with a penny that is facing tails up, it has a 51% chance of landing facing tails up. He is wondering if this is true, because if it is, he should use this strategy when calling the coin toss at the beginning of games. He flips a coin 10 times (starting with tails up) and finds that he does not have enough evidence to reject the null hypothesis that π =.50. Why is the power of the test important here, and how might understanding power influence his conclusions? We could do a formal power calculation here, but we can notice right away that the power of this test is likely very low. Even if the true process proportion is π =.51, this is going to yield a distribution of sample proportions that looks a lot like the distribution of sample proportions we would see with a process proportion of π =.50, so we would need a very large sample size (much greater than 10), to have a good chance of detecting this effect. By understanding this, we can realize that failing to reject H 0 doesn t tell us very much here, because we we probably weren t going to be able to reject H 0 even if π =.51. (And this is one of the primary reasons why if we fail to reject H 0 we do not conclude that this means that H 0 is true). (This is beyond the scope of the material for this class, but if you wanted to do a formal power calculation you could use qbinom() or some trial and error with dbinom() to figure out that we would need to observe either 9 or 10 successes to reject H 0 using α =.05, two-tailed (or one-tailed, it turns out). You could then use pbinom() or dbinom() to ask about the probability of observing 9 successes if π =.51, and you would find that the power of the test is.013. In other words, he only has a 1.3% chance of detecting the effect if it exists (i.e., a 1.3% chance of rejecting H 0 and finding evidence that π >.50) so failing to reject the null hypothesis is not informative here. You might also then notice that he has a.91% chance of getting 0 or 1 tails, which would lead him to reject H 0 and conclude that the true population proportion, π, is less than.50 (he would conclude that if the coin starts on tails it is most likely to land on heads). This is not a good situation to be in, since his power to draw a correct conclusion is extremely close to his probability of drawing a very incorrect conclusion.

5 Part 2: Required sample sizes for desired margin of error 1. (Adapted from Tintle 3.CE.5). The margin of error for a 95% confidence interval for a process (or population) probability, π, can be approximated using 1 / n. a. Calculate what the margin of error for a 95% confidence interval will be for the following sample sizes: 100, 400, 1000, 2000, 8000, (Note, vector operations in R would be very useful here). > n <- c(100, 400, 1000, 2000, 8000, 9000) > margin <- 1 / sqrt(n) > margin [1] b. Sketch your results by hand or plot your results in R (make sure to define the x axis and the y axis if you use the plot() function), and describe how the margin of error changes with the sample size. > plot(n, margin) c. In order to cut the margin of error in half, by how many times must you increase our sample size? we want (bigger margin of error) = 2 * (smaller margin of error) 1/ n bigger_margin = 2 * 1/ n smaller_margin 1/ n bigger_margin = 2 * 1/ n smaller_margin (1/ n bigger_margin ) 2 = (2 * 1/ n smaller_margin ) 2 1/n bigger_margin = 4 * 1/n smaller_margin n smaller_margin ) = 4 * n bigger_margin we need to multiply our sample size by 4 in order to cut the margin of error in half d. Which would have a bigger impact on the margin of error: increasing the sample size from 100 to 400 or increasing the sample size from 8000 to 9000? (And explain why). Increasing the sample size from 100 to 400 is a larger relative increase (4x) than increasing the sample size from 8000 to 9000 (1.125x), and will have a bigger impact on the margin of error. Because the margin of error depends on 1 / n, we get less bang for our buck when we increase n once n is already large (diminishing returns). 2. For each scenario, compute the required sample size to get a confidence interval with at most the specified margin of error. (Remember that we are only considering confidence intervals for two-tailed tests). Note that this question is referring to

6 confidence intervals for single means this wasn t explicitly stated in the original version of the problem set, but the standard deviations and the sizes of the margins of error are too large to be referring to proportions, which are bounded between 0 and 1. - Also note that σ indicates we should be using z-statistics (known population standard deviation), and s indicates that we should be using t-statistics (estimated population standard deviation). However, because we cannot know df for a t- statistic without knowing the sample size, we use a z-statistic for a sample size calculation for a t-test, justified by the fact that required sample size is typically large enough that there is minimal difference between the distribution of t df=required_sample_size and the distribution of z - Also note that we always round up for required sample size calculations, because we want to know how many observations we need to sample (once, not on average in the long-run, so we need a whole number) to get at at most a margin of error, and increasing n decreases the margin of error a. σ = 10, α =.05, margin of error of at most 6 margin = z critical, α=.05 * σ x = z critical, α=.05 * σ / n n = z critical, α=.05 * σ / margin n = (z critical, α=.05 * σ / margin) 2 n = (1.96 * 10 / 6) 2 n = 10.67, round up to n = 11 b. σ = 20, α =.05, margin of error of at most 6 margin = z critical, α=.05 * σ x = z critical, α=.05 * σ / n n = z critical, α=.05 * σ / margin n = (z critical, α=.05 * σ / margin) 2 n = (1.96 * 20 / 6) 2 n = 42.69, round up to n = 43 c. σ = 10, α =.01, margin of error of at most 6 margin = z critical, α=.01 * σ x = z critical, α=.01 * σ / n n = z critical, α=.01 * σ / margin n = (z critical, α=.01 * σ / margin) 2 n = (2.58 * 10 / 6) 2 n = 18.49, round up to n = 19 d. s = 20, α =.01, margin of error of at most 2 margin = z critical, α=.01 * s x = z critical, α=.01 * s / n n = z critical, α=.01 * s / margin n = (z critical, α=.01 * s / margin) 2 n = (2.58 * 20 / 2) 2 n = , round up to n = 666 e. s = 15, α =.10, margin of error of at most 3 margin = z critical, α=.10 * s x = z critical, α=.10 * s / n n = z critical, α=.10 * s / margin n = (z critical, α=.10 * s / margin) 2 n = (1.64 * 15 / 3) 2

7 n = 67.24, round up to n = 68 Part 3: Sampling, causality, and generalization Screenshots of relevant problems available on Canvas in the Files section. Tintle et al., Chapter 3: a. the coffee bars might have their longest waiting times at different times of the day (due to different customer patterns, staffing patterns, etc). b. you might want to visit each coffee bar at multiple times throughout the day, or stay at each coffee bar for the entire day and sample customers at random throughout the day this question has a double negative that makes giving a yes/no answer confusing yes, the options are more clear, and rather than giving a yes/no answer, the respondent needs to clearly state their opinion we would expect assistance to the poor to give a more favorable response rate toward these programs responses will vary, there are many correct answers here people might be uncomfortable answering no to an interviewer who is smoking (on the other hand, they might be particularly bothered by the smoke and become more inclined to say yes ) answers will vary and there are reasonable explanations for a number of answers, but standard answers would be: a: underestimate or overestimate (people are generally not great at estimating how much sleep they get), b. overestimate or overstate, c. overstate, d. overestimate or overstate, e. overstate Tintle et al., Chapter 4: the weather is colder and snowier during these months, and these weather patterns might also be related to heart attacks (you might also generate other explanations) no, number of TVs is likely a proxy for wealth or development in a country, which is linked to life expectancy because we don t think there is a cause-and-effect relationship between number of TVs and life expectancy, changing the number of TVs in a country would not change life expectancy a. explanatory (grouping) variable is Mediterranean diet vs. no Mediterranean diet, and the response variable is memory and cognitive skills. b. there are many possible explanations here one is that people who choose a Mediterranean diet might tend to be wealthy (a Mediterranean diet tends to be expensive) or might share other socioeconomic similarities that are also linked to memory and cognitive skills a. teenagers, b. the explanatory (grouping) variable is whether or not the teenager eats family dinners, and the response variable is the teenager s drug use, c. there are many possible explanations here one is that family dinners require parents to be home during dinner time, and the feasibility of this is linked to socioeconomic status (as is drug use), another is consider parents drug use or other genetic factors that might both influence teenage drug use and the presence or absence of family dinners a. the explanatory (grouping) variable is children vs. no children, which is categorical, b. the response variable is life span (lifetime), which is quantitative, c. the

8 dotplots suggest that the central tendency for lifetime is higher for men with children, d. however, the dotplots also highlight that many men do not have children until they reach a certain age, and so by definition we might expect men who have children to be older than men who did not have children, and thus if we take a sample of men who have died, we would also expect men who have children to be older than men who do not have children a (c is very important, but it relates to how you sample the larger pool of participants, before randomly assigning them to group) c a b (it is the only scenario in which we wouldn t expect non-random pre-existing differences between the two groups) a (it is the only scenario in which we wouldn t expect non-random pre-existing differences between the two groups) Part 4: Comparing two proportions HIV and AZT (adapted from Tintle et al.). In a 1994 study, 164 pregnant, HIV-positive women were randomly assigned to receive the drug AZT during pregnancy and 160 pregnant, HIV-positive women were randomly assigned to a control group that received a placebo. They found that 40 of the mothers in the control group gave birth to babies who were HIV-positive, compared to only 13 in the AZT group. a. Is this an observational study or an experiment? experiment, because women were randomly assigned to group b. Identify the grouping variable and the response variable grouping variable is AZT vs. placebo, response variable whether there was an HIV-positive birth c. Construct a 2 x 2 table of frequencies (counts), with the grouping variable in columns. Fill in the marginal frequencies too. AZT PLACEBO TOTAL HIV-POSITIVE BIRTH = 53 HIV-NEGATIVE BIRTH = = = 271 TOTAL = 324 d. Calculate p(born HIV-positive placebo), p(placebo born HIV-positive), p(born HIV-positive AZT), and p(placebo not born HIV-positive). Which of these two would we compare if we wanted to examine the effectiveness of AZT in preventing HIV-positive births? p(born HIV-positive placebo) = 40 / 160 =.25 p(placebo born HIV-positive) = 40 / 53 =.75 [note, it is coincidence that the first two conditional probabilities sum to 1, this is not guaranteed] p(born HIV-positive AZT) = 13 / 164 =.079 p(placebo not born HIV-positive) = 120 / 271 =.44 if we wanted to compare the effectiveness of AZT and placebo in preventing HIV-positive births, we would want to compare the proportion of HIV-

9 positive births in the AZT vs. placebo groups, i.e., p(born HIV-positive placebo) and p(born HIV-positive AZT) e. Compare the central tendency and variability of the placebo and AZT groups. The mode of each group is HIV-negative birth. The AZT group is less variable than the placebo group (relative frequency at the mode is higher,.92 vs..75) f. Calculate the difference in sample proportions of HIV-positive births, p placebo p ÂZT (note that, counterintuitively, this means we are labelling an HIV-positive birth a success ) p placebo = p(born HIV-positive placebo) = 40 / 160 =.25 p ÂZT = p(born HIV-positive AZT) = 13 / 164 =.079 p placebo p ÂZT = =.171 g. State two hypotheses about the difference in population proportions, π placebo - π AZT, with the null hypothesis stating there is no relationship between placebo/azt and HIV-positive births. Use two-tailed hypotheses. H 0 : = π placebo - π AZT = 0 H A : = π placebo - π AZT 0 h. Use the Two Proportion applet to simulate a distribution of differences in sample proportions that we could observe if there was no relationship between placebo/azt and HIV-positive births. Use this distribution to compute the probability of observing a sample difference as or more extreme as our sample difference, if there was no relationship between the two variables. (Or describe how you could simulate a single instance of a difference in sample proportions). Exact answer will vary from simulation to simulation with the applet. Please see us in office hours for help using the applet that we used in lecture. The applet simulates the following procedure: you could shuffle 53 red cards and 271 blue cards, and randomly deal them into piles of 164 and 160, and calculate the difference in proportion of red cards in the two piles. i. Use a theory-based method to i. Describe the mean and standard deviation of the distribution of sample proportions that we could observe if there was no relationship between placebo/azt and HIV-positive births μ p _placebo p _AZT = 0 σ p _placebo p _AZT = (π pooled * (1-π pooled )) / n 1 + (π pooled * (1-π pooled )) / n 2 )) calculate, π pooled = 53 / 324 =.16 σ p _placebo p _AZT = (.16 * (1-.16)) / (.16 * (1-.16)) / 160)) σ p _placebo p _AZT =.041 (not asked, but we also assume the distribution is a normal distribution, which holds as long as the distributions of individual group proportions are normally distributed) ii. Describe the location of our observed difference in sample proportions (p placebo p ÂZT ) in this distribution, using a z-statistic

10 z = ((p placebo p ÂZT ) - μ p _placebo p _AZT ) / σ p _placebo p _AZT z = (.171 0) /.041 = 4.17 iii. Use R to calculate the two-tailed p-value for that z-statistic, and test the null hypothesis using a significance level of α =.05. p(z < or z > 4.17) = pnorm(-4.17) * 2 = j. Reflect on the choice of a two-tailed test with α =.05. Why might a twotailed test have been a good idea here? Thinking about the consequences of potential Type I and a Type II Errors in this scenario, do you have an argument for adjusting α? It would be very important to know if AZT led to more or fewer HIV-positive births than a placebo, and so we are interested in making an inference regardless of the direction of our observed sample difference. A Type I Error would be erroneously detecting a relationship between AZT and HIV-positive births when one does not exist. A Type II Error would be concluding that we do not have enough evidence for concluding that there is a relationship between AZT and HIV-positive births when one does actually exist. If we were especially concerned about Type I Error we would want to decrease alpha, and if we were especially concerned about Type II Error we would want to increase alpha. k. Calculate the measures of effect size of relative risk and number needed to treat, and interpret what they mean in this situation. relative risk = p larger / p ŝmaller =.25 /.079 = Based on our best estimate from our sample proportions, it seems that a woman taking a placebo is 3.16 times as likely to have an HIV-positive birth relative to a woman taking AZT. NNT = 1 / p placebo p ÂZT = 1 / = Based on our best estimate from our sample proportions, we would, on average, need to give 5.85 women AZT instead of a placebo to prevent a single HIV-positive birth. l. Can we draw cause-and-effect conclusions from this study, i.e., does taking the drug AZT during pregnancy cause a reduction in HIV-positive births? Yes, because women were randomly assigned to take AZT versus a placebo, so we don t need to worry that the two groups differ with respect to other, confounding variables (beyond what we would expect by random chance). Praising children (adapted from Tintle et al.). Psychologists investigated whether praising a child s intelligence rather than praising his / her effort, tends to have negative consequences such as undermining their motivation (Mueller and Dweck, 1998). Children participating in the study were given a set of problems to solve. After the first set of problems children were randomly assigned to be praised for either their intelligence or their effort. The children then were given another set of problems to solve and later told how many they got right. They were then asked to share the number that they got right with other students. Some of the children misrepresented (i.e., lied about) how many they got right. Of 59 children, 11 were praised for intelligence and misrepresented their score, 4

11 were praised for effort and misrepresented their score, 18 were praised for intelligence and did not misrepresent their score, and 26 were praised for effort and did not misrepresent their score. Researchers were interested in learning whether there was a difference in the proportion of children who lied, depending how they were praised. a. Is this an observational study or an experiment? experiment, because children were randomly assigned to receive intelligence or effort praise b. Identify the grouping variable and the response variable the grouping variable is type of praise and the response variable is whether or not children misrepresented their score c. Construct a 2 x 2 table of frequencies (counts), with the grouping variable in columns. Fill in the marginal frequencies too. INTELLGENCE EFFORT TOTAL DID MISREPRESENT = 15 DID NOT MISREPRESENT = 44 TOTAL = = d. Calculate p(misrepresented score intelligence praise), p(misrepresented score effort praise), p(praised for intelligence misrepresented score), and p(praised for intelligence did not misrepresent score). Which of these two would we compare if we wanted to examine the effect of type of praise on misrepresenting scores? p(misrepresented score intelligence praise) = 11 / 29 =.38 p(misrepresented score effort praise) = 4 / 30 =.13 p(praised for intelligence misrepresented score) = 11 / 15 =.73 p(praised for intelligence did not misrepresent score) = 18 / 44 =.41 if we wanted to compare how intelligence vs. effort influences misrepresentation of scores, we would want to compare the proportion of children who misrepresent their scores in the intelligence vs. effort groups, i.e., p(misrepresented score intelligence praise) and p(misrepresented score effort praise) e. Compare the central tendency and variability of the effort and intelligence praise groups. In both groups the mode is did not misrepresent score, but the intelligence praise group has greater variability than the effort praise group (relative frequency at the mode =.62 vs..87) Calculate the difference in sample proportions of misrepresentation of scores, p întelligence p êffort (note that, counterintuitively, this means we are labelling a misrepresentation a success ) p întelligence = p(misrepresented score intelligence praise) = 11 / 29 =.38 p êffort = p(misrepresented score effort praise) = 4 / 30 =.13 p întelligence p êffort = =.25 m. State two hypotheses about the difference in population proportions, π intelligence π effort, with the null hypothesis stating there is no relationship

12 between type of praise and misrepresentation of scores. Use two-tailed hypotheses. H 0 : = π intelligence π effort = 0 H A : = π intelligence π effort 0 f. Use the Two Proportion applet to simulate a distribution of differences in sample proportions that we could observe if there was no relationship between type of praise and misrepresentation of scores. Use this distribution to compute the probability of observing a sample difference as or more extreme as our sample difference, if there was no relationship between the two variables. (Or describe how you could simulate a single instance of a difference in sample proportions). Exact answer will vary from simulation to simulation with the applet. Please see us in office hours for help using the applet that we used in lecture. The applet simulates the following procedure: you could shuffle 15 red cards and 44 blue cards, and randomly deal them into piles of 29 and 30, and calculate the difference in proportion of red cards in the two piles. g. Use a theory-based method to i. Describe the mean and standard deviation of the distribution of sample proportions that we could observe if there was no relationship between type of praise and misrepresentation of scores μ p _intelligence p _effort = 0 σ p _intelligence p _effort = (π pooled * (1-π pooled )) / n 1 + (π pooled * (1-π pooled )) / n 2 )) calculate, π pooled = 15 / 59 =.25 σ p _intelligence p _effort = (.25 * (1-.25)) / 29 + (.25 * (1-.25)) / 30)) σ p _intelligence p _effort =.11 (not asked, but we also assume the distribution is a normal distribution, which holds as long as the distributions of individual group proportions are normally distributed) ii. Describe the location of our observed difference in sample proportions (p întelligence p êffort ) in this distribution, using a z-statistic z = ((p întelligence p êffort ) - μ p _intelligence p _effort ) / σ p _intelligence p _effort z = (.25 0) /.11 = 2.27 iii. Use R to calculate the two-tailed p-value for that z-statistic, and test the null hypothesis using a significance level of α =.05. p(z < or z > 2.27) = pnorm(-2.27) * 2 =.023 h. Calculate the measures of effect size of relative risk and number needed to treat, and interpret what they mean in this situation. relative risk = p larger / p ŝmaller =.38 /.13 = Based on our best estimate from our sample proportions, it seems that a child praised for intelligence is 2.92 times as likely to misrepresent their score relative to a child praised for effort. NNT = 1 / p întelligence p êffort = 1 / = Based on our best estimate from our sample proportions, we would, on average, need to praise 4.00 children for effort instead of for intelligence to prevent a child

13 misrepresenting their score. (Another way to phrase this is that we would, on average, need to praise 4.00 children for intelligence instead of for effort to find one more child who misrepresents their score). Can we draw cause-and-effect conclusions from this study, i.e., does being praised for intelligence cause a change in misrepresentation of scores? Yes, because children were randomly assigned to get effort versus intelligence praise, so we don t need to worry that the two groups differ with respect to other, confounding variables (beyond what we would expect by random chance). Japanese comics (courtesy of Mia Lewis). A researcher is interested in comparing attributes of boys and girls comics in Japan. She samples the first page of n = 45 boys comics and n = 54 girls comics, and counts the number of first pages in each group that include sparkles. She finds that 12 of the boys comics include sparkles and 43 of the girls comics include sparkles. a. Describe the population value that the researcher is interested in. We want to know the difference in proportion of sparkles in the entire population of boys comics vs. the entire population of girls comics. b. Identify the grouping variable and the response variable The grouping variable is boys vs. girls comics, and the response variable is whether or not there are sparkles c. Construct a 2 x 2 table of frequencies (counts), with the grouping variable in columns. Fill in the marginal frequencies too. BOYS GIRLS TOTAL SPARKLES = 55 NO SPARKLES = = = 44 TOTAL d. Compare the central tendency and variability of the boys and girls comics. The mode for boys comics is no sparkles and the mode for girls comics is sparkles. The boys comics have greater variability than the girls comics (relative frequency at the mode is 33 / 45 =.73 vs. 43 / 54 =.80). e. Calculate the difference in sample proportions of has sparkles, p boys p ĝirls. p boys = 12 / 45 =.27 p ĝirls = 43 / 54 =.80 p boys - p ĝirls = =.-.53 f. State two hypotheses about the difference in population proportions, π boys π girls. Use two-tailed hypotheses. H 0 : = π boys π girls = 0 H A : = π boys π girls 0 g. Use the Two Proportion applet to simulate a distribution of differences in sample proportions that we could observe if there was no relationship between type of comic and presence of sparkles. Use this distribution to compute the probability of observing a sample difference as or more extreme as our sample difference, if

14 there was no relationship between the two variables. (Or describe how you could simulate a single instance of a difference in sample proportions). Exact answer will vary from simulation to simulation with the applet. Please see us in office hours for help using the applet that we used in lecture. The applet simulates the following procedure: you could shuffle 55 red cards and 44 blue cards, and randomly deal them into piles of 45 and 54, and calculate the difference in proportion of red cards in the two piles. h. Use a theory-based method to a. Describe the mean and standard deviation of the distribution of sample proportions that we could observe if there was no relationship between type of comic and presence of sparkles μ p _boys p _girls = 0 σ p _boys p _girls = (π pooled * (1-π pooled )) / n 1 + (π pooled * (1-π pooled )) / n 2 )) calculate, π pooled = 55 / 99 =.56 σ p _boys p _girls = (.56 * (1-.56)) / 45 + (.56 * (1-.56)) / 54)) σ p _boys p _girls =.10 (not asked, but we also assume the distribution is a normal distribution, which holds as long as the distributions of individual group proportions are normally distributed) b. Describe the location of our observed difference in sample proportions (p boys p ĝirls ) in this distribution, using a z-statistic z = ((p boys p ĝirls ) - μ p _boys p _girls ) / σ p _boys p _girls z = ( ) /.10 = c. Use R to calculate the two-tailed p-value for that z-statistic, and test the null hypothesis using a significance level of α =.05. p(z < or z > 5.30) = pnorm(-5.30) * 2 = i. Calculate the measures of effect size of relative risk and number needed to treat, and interpret what they mean in this situation. relative risk = p larger / p ŝmaller =.80 /.27 = Based on our best estimate from our sample proportions, it seems that a girls comic book is 2.96 times as likely to have sparkles than a boys comic book. NNT = 1 / p îboys p ĝirls = 1 / = Based on our best estimate from our sample proportions, we would, on average, need to sample 1.89 girls comic books instead of boys comic books find one more comic books with sparkles. (Another way of saying this is that we would, on average, need to sample 1.89 boys comic books instead of girls comic books to find one fewer comic books with sparkles).

Sampling Distributions: Central Limit Theorem

Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)