STA 101 Final Review

Similar documents
Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

STATISTICS 141 Final Review

Confidence Intervals, Testing and ANOVA Summary

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Inferences for Regression

Announcements. Final exam, Saturday 9AM to Noon, usual classroom cheat sheet (1 page, front&back) + calculator

Lecture 11 - Tests of Proportions

Harvard University. Rigorous Research in Engineering Education

Announcements. Final Review: Units 1-7

Ch Inference for Linear Regression

Sociology 6Z03 Review II

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Annoucements. MT2 - Review. one variable. two variables

Econometrics. 4) Statistical inference

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Review of Statistics 101

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Business Statistics. Lecture 10: Course Review

Inference for the Regression Coefficient

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Inference for Regression

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Tables Table A Table B Table C Table D Table E 675

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Mathematical Notation Math Introduction to Applied Statistics

Ch 2: Simple Linear Regression

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Lecture 19: Inference for SLR & Transformations

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Lecture 18: Simple Linear Regression

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

INFERENCE FOR REGRESSION

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Two-Sample Inference for Proportions and Inference for Linear Regression

2. Outliers and inference for regression

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

ST430 Exam 1 with Answers

Unit 6 - Introduction to linear regression

Multiple Linear Regression for the Salary Data

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

1 Statistical inference for a population mean

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Sampling Distributions: Central Limit Theorem

Chapter 7 Comparison of two independent samples

Variance Decomposition and Goodness of Fit

y n 1 ( x i x )( y y i n 1 i y 2

Table 1: Fish Biomass data set on 26 streams

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Psychology 282 Lecture #4 Outline Inferences in SLR

Stat 401B Exam 2 Fall 2015

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

STA Module 10 Comparing Two Proportions

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

ST430 Exam 2 Solutions

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

SCHOOL OF MATHEMATICS AND STATISTICS

hypothesis a claim about the value of some parameter (like p)

Lecture 6 Multiple Linear Regression, cont.

1 Independent Practice: Hypothesis tests for one parameter:

Simple Linear Regression: One Qualitative IV

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

Unit 6 - Simple linear regression

Stat 5102 Final Exam May 14, 2015

Correlation and Simple Linear Regression

Chapter 14 Simple Linear Regression (A)

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Correlation Analysis

23. Inference for regression

Chapter 9 Inferences from Two Samples

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Lecture 11: Simple Linear Regression

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

MATH 644: Regression Analysis Methods

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Inference for Regression Simple Linear Regression

Ch 3: Multiple Linear Regression

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

Chapter 9. Correlation and Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Mathematical Notation Math Introduction to Applied Statistics

Lecture 3: Inference in SLR

Measuring the fit of the model - SSR

STA Module 11 Inferences for Two Population Means

STA Rev. F Learning Objectives. Two Population Means. Module 11 Inferences for Two Population Means

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Formal Statement of Simple Linear Regression Model

Biostatistics 380 Multiple Regression 1. Multiple Regression

Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Transcription:

STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013

Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem 114) Final Exam: 9am 12pm HERE Bring a calculator no cell phones, laptops, tablets, etc. Allowed one 8 1 11 inch cheat sheet with notes on both sides. 2 You must create this yourself. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 2 / 19

Topics for today different conditions and hypotheses in each test (and a list of tests) when to use Z, T, F stats, 2 vs 1 sample tests, degrees of freedom pooled variance, pooled proportion (when to use) confidence intervals is it always two-sided?, how to interpret CI of difference b/w 2 means chi-square what we are testing, when to use, how to approach hypotheses ANOVA (filling in the chart) Regression interpreting linear lines and writing them, correlation, residuals Type I, II error Bayesian probability won t be on there (but cond l probability might) MLR won t be on there Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 3 / 19

What you need to know about HTs: Format for answering a hypothesis test question: 1 State the null and alternative hypotheses 2 Check conditions 3 Calculate the test statistic (T, Z, etc.) and standard error (if needed) 4 Calculate the p-value (double if two-sided hypothesis) 5 Reject or fail to reject the null hypothesis 6 Interpret your decision in context of the problem Know how to interpret a p-value in context Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 4 / 19

What you need to know about CIs: Format for answering a confidence interval question: 1 Check conditions 2 Find and state the critical value (z, t df ) 3 Calculate the standard error 4 Calculate the confidence interval 5 Interpret your confidence interval in context of the problem Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 5 / 19

Different types of tests General conditions for HTs/CIs with means or proportions: Independence random samples ( 10% of population sampled) nearly normal data either we know the population is normal or we have to use CLT Conditions for CLT: sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization sample size 30. Else, you should use a t-distribution. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 6 / 19

One sample test for mean Conditions for one sample test: Independence random samples ( 10% of population sampled) sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization sample size 30. Else, you should use a t-distribution. Note: if paired data, use differences as a one sample test Conditions for two sample test: Independence random samples ( 10% of population sampled) sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization both sample sizes 30. Else, you should use a t-distribution (if either or both are smaller than 30). Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 7 / 19

Recap - inference for one proportion Population parameter: p, point estimate: ˆp Conditions: independence - random sample at least 10 successes and failures - if not randomization p(1 p) Standard error: SE = n for CI: use ˆp for HT: use p 0 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 8 / 19

Recap - comparing two proportions Population parameter: (p 1 p 2 ), point estimate: (ˆp 1 ˆp 2 ) Conditions: independence within groups - random sample and 10% condition met for both groups independence between groups at least 10 successes and failures in each group - if not randomization p SE (ˆp1 ˆp 2 ) = 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 for CI: use ˆp 1 and ˆp 2 for HT: when H 0 : p 1 = p 2 : use ˆp pool = # suc 1+#suc 2 n 1 +n 2 when H 0 : p 1 p 2 = (some value other than 0): use ˆp 1 and ˆp 2 - this is pretty rare Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 9 / 19

Reference - standard error calculations one sample two samples mean SE = s n SE = s 2 1 n 1 + s2 2 n 2 proportion SE = p(1 p) n SE = p 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 When working with means, it s very rare that σ is known, so we usually use s. When working with proportions, if doing a hypothesis test, p comes from the null hypothesis if constructing a confidence interval, use ˆp instead Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 10 / 19

When to use pooled standard error For a two sample HT/CI for a difference of means, we can pool our information about the variance if s 1 and s 2 can be assumed to be roughly equal If we can do this, then we replace s 2 1 and s2 2 with s2 pool, where s 2 pool = s2 1 (n 1 1) + s 2 2 (n 2 1) n 1 + n 2 2 The degrees of freedom for the t-distribution are now df = n 1 + n 2 2 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 11 / 19

When to use pooled proportion When doing a HT for a two sample test for a difference of proportions, the null hypothesis is that p 1 = p 2 or p 1 p 2 = 0 Since we assume that H 0 is true in a HT, we assume p pool = p 1 = p 2, which means our standard error is p pool (1 p pool ) SE = + p pool(1 p pool ) n 1 n 2 We estimate p pool with ˆp pool = # of successes 1 + # of successes 2 n 1 + n 2 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 12 / 19

More on confidence intervals how to interpret diff b/w two means Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 13 / 19

ANOVA filling in chart Exercise 5.40 Educational attainment Less than HS HS Jr Coll Bachelor s Graduate Total Mean 38.67 39.6 41.39 42.55 40.85 40.45 SD 15.81 14.97 18.1 13.62 15.51 15.17 n 121 546 97 253 155 1,172 Df Sum Sq Mean Sq F value Pr(>F) degree XXXXX XXXXX 501.54 XXXXX 0.0682 Residuals XXXXX 267,382 XXXXX Total XXXXX XXXXX Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 14 / 19

Chi-square tests Goodness of Fit: 1 variable H 0 : There is no inconsistency between the observed and the expected counts. H A : There is an inconsistency between the observed and the expected counts. Test of Independence: 2 variables H 0 : Variable 1 and Variable 2 are independent. H A : Variable 1 and Variable 2 are not independent. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 15 / 19

Regression Exercise 7.28 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -0.012701 0.012638-1.005 0.332 bac$beers 0.017964 0.002402 7.480 2.97e-06 BAC (grams per deciliter) 0.15 0.10 0.05 2 4 6 8 Cans of beer Residual standard error: 0.02044 on 14 degrees of freedom Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855 F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06 Write the equation of the regression line. Interpret the slope and intercept in context. Do the data provide strong evidence that drinking more cans of beer is associated with an increase in blood alcohol? State the null and alternative hypotheses, report the p-value, and state your conclusion. What is R 2? Interpret R 2 in context. What is the correlation? Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 16 / 19

Regression Conditions for regression linearity nearly normal residuals constant variability (of residuals) Residuals Predict BAC content for someone who has had 5 cans of beer: ˆ BAC = 0.012701 + 0.017964 5 = 0.077099 Observed BAC is 0.10 Residual is y i ŷ i = 0.10 0.077 = 0.023 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 17 / 19

Type I, Type II error Decision fail to reject H 0 reject H 0 H 0 true Type 1 Error Truth HA true Type 2 Error Type 1 error is rejecting H 0 when you shouldn t have, and the probability of doing so is α (significance level) Type 2 error is failing to reject H 0 when you should have, and the probability of doing so is β (more complicated to calculate) Power of a test is the probability of correctly rejecting H 0, and the probability of doing so is 1 β In hypothesis testing, we want to keep α and β low, but there are inherent trade-offs. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 18 / 19

Calculating sample sizes For our CIs, we can calculate the minimum sample size needed to provide a certain margin of error For a desired (maximum) ME, call it m, we start with m ME = critical value SE (function on n) Solve for n, should always look like n 123.45 Then round n up to the next whole number (123.45 124). For a two-sample CI, you would need both sample sizes to be at least this large. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 19 / 19