STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013
Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem 114) Final Exam: 9am 12pm HERE Bring a calculator no cell phones, laptops, tablets, etc. Allowed one 8 1 11 inch cheat sheet with notes on both sides. 2 You must create this yourself. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 2 / 19
Topics for today different conditions and hypotheses in each test (and a list of tests) when to use Z, T, F stats, 2 vs 1 sample tests, degrees of freedom pooled variance, pooled proportion (when to use) confidence intervals is it always two-sided?, how to interpret CI of difference b/w 2 means chi-square what we are testing, when to use, how to approach hypotheses ANOVA (filling in the chart) Regression interpreting linear lines and writing them, correlation, residuals Type I, II error Bayesian probability won t be on there (but cond l probability might) MLR won t be on there Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 3 / 19
What you need to know about HTs: Format for answering a hypothesis test question: 1 State the null and alternative hypotheses 2 Check conditions 3 Calculate the test statistic (T, Z, etc.) and standard error (if needed) 4 Calculate the p-value (double if two-sided hypothesis) 5 Reject or fail to reject the null hypothesis 6 Interpret your decision in context of the problem Know how to interpret a p-value in context Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 4 / 19
What you need to know about CIs: Format for answering a confidence interval question: 1 Check conditions 2 Find and state the critical value (z, t df ) 3 Calculate the standard error 4 Calculate the confidence interval 5 Interpret your confidence interval in context of the problem Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 5 / 19
Different types of tests General conditions for HTs/CIs with means or proportions: Independence random samples ( 10% of population sampled) nearly normal data either we know the population is normal or we have to use CLT Conditions for CLT: sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization sample size 30. Else, you should use a t-distribution. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 6 / 19
One sample test for mean Conditions for one sample test: Independence random samples ( 10% of population sampled) sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization sample size 30. Else, you should use a t-distribution. Note: if paired data, use differences as a one sample test Conditions for two sample test: Independence random samples ( 10% of population sampled) sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization both sample sizes 30. Else, you should use a t-distribution (if either or both are smaller than 30). Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 7 / 19
Recap - inference for one proportion Population parameter: p, point estimate: ˆp Conditions: independence - random sample at least 10 successes and failures - if not randomization p(1 p) Standard error: SE = n for CI: use ˆp for HT: use p 0 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 8 / 19
Recap - comparing two proportions Population parameter: (p 1 p 2 ), point estimate: (ˆp 1 ˆp 2 ) Conditions: independence within groups - random sample and 10% condition met for both groups independence between groups at least 10 successes and failures in each group - if not randomization p SE (ˆp1 ˆp 2 ) = 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 for CI: use ˆp 1 and ˆp 2 for HT: when H 0 : p 1 = p 2 : use ˆp pool = # suc 1+#suc 2 n 1 +n 2 when H 0 : p 1 p 2 = (some value other than 0): use ˆp 1 and ˆp 2 - this is pretty rare Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 9 / 19
Reference - standard error calculations one sample two samples mean SE = s n SE = s 2 1 n 1 + s2 2 n 2 proportion SE = p(1 p) n SE = p 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 When working with means, it s very rare that σ is known, so we usually use s. When working with proportions, if doing a hypothesis test, p comes from the null hypothesis if constructing a confidence interval, use ˆp instead Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 10 / 19
When to use pooled standard error For a two sample HT/CI for a difference of means, we can pool our information about the variance if s 1 and s 2 can be assumed to be roughly equal If we can do this, then we replace s 2 1 and s2 2 with s2 pool, where s 2 pool = s2 1 (n 1 1) + s 2 2 (n 2 1) n 1 + n 2 2 The degrees of freedom for the t-distribution are now df = n 1 + n 2 2 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 11 / 19
When to use pooled proportion When doing a HT for a two sample test for a difference of proportions, the null hypothesis is that p 1 = p 2 or p 1 p 2 = 0 Since we assume that H 0 is true in a HT, we assume p pool = p 1 = p 2, which means our standard error is p pool (1 p pool ) SE = + p pool(1 p pool ) n 1 n 2 We estimate p pool with ˆp pool = # of successes 1 + # of successes 2 n 1 + n 2 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 12 / 19
More on confidence intervals how to interpret diff b/w two means Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 13 / 19
ANOVA filling in chart Exercise 5.40 Educational attainment Less than HS HS Jr Coll Bachelor s Graduate Total Mean 38.67 39.6 41.39 42.55 40.85 40.45 SD 15.81 14.97 18.1 13.62 15.51 15.17 n 121 546 97 253 155 1,172 Df Sum Sq Mean Sq F value Pr(>F) degree XXXXX XXXXX 501.54 XXXXX 0.0682 Residuals XXXXX 267,382 XXXXX Total XXXXX XXXXX Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 14 / 19
Chi-square tests Goodness of Fit: 1 variable H 0 : There is no inconsistency between the observed and the expected counts. H A : There is an inconsistency between the observed and the expected counts. Test of Independence: 2 variables H 0 : Variable 1 and Variable 2 are independent. H A : Variable 1 and Variable 2 are not independent. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 15 / 19
Regression Exercise 7.28 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.012701 0.012638-1.005 0.332 bac$beers 0.017964 0.002402 7.480 2.97e-06 BAC (grams per deciliter) 0.15 0.10 0.05 2 4 6 8 Cans of beer Residual standard error: 0.02044 on 14 degrees of freedom Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855 F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06 Write the equation of the regression line. Interpret the slope and intercept in context. Do the data provide strong evidence that drinking more cans of beer is associated with an increase in blood alcohol? State the null and alternative hypotheses, report the p-value, and state your conclusion. What is R 2? Interpret R 2 in context. What is the correlation? Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 16 / 19
Regression Conditions for regression linearity nearly normal residuals constant variability (of residuals) Residuals Predict BAC content for someone who has had 5 cans of beer: ˆ BAC = 0.012701 + 0.017964 5 = 0.077099 Observed BAC is 0.10 Residual is y i ŷ i = 0.10 0.077 = 0.023 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 17 / 19
Type I, Type II error Decision fail to reject H 0 reject H 0 H 0 true Type 1 Error Truth HA true Type 2 Error Type 1 error is rejecting H 0 when you shouldn t have, and the probability of doing so is α (significance level) Type 2 error is failing to reject H 0 when you should have, and the probability of doing so is β (more complicated to calculate) Power of a test is the probability of correctly rejecting H 0, and the probability of doing so is 1 β In hypothesis testing, we want to keep α and β low, but there are inherent trade-offs. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 18 / 19
Calculating sample sizes For our CIs, we can calculate the minimum sample size needed to provide a certain margin of error For a desired (maximum) ME, call it m, we start with m ME = critical value SE (function on n) Solve for n, should always look like n 123.45 Then round n up to the next whole number (123.45 124). For a two-sample CI, you would need both sample sizes to be at least this large. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 19 / 19