Annoucements. MT2 - Review. one variable. two variables

Similar documents
Unit5: Inferenceforcategoricaldata. 4. MT2 Review. Sta Fall Duke University, Department of Statistical Science

Announcements. Final Review: Units 1-7

Announcements. Final exam, Saturday 9AM to Noon, usual classroom cheat sheet (1 page, front&back) + calculator

STA 101 Final Review

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Lecture 11 - Tests of Proportions

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Exam details. Final Review Session. Things to Review

2. Outliers and inference for regression

Announcements. Unit 4: Inference for numerical variables Lecture 4: ANOVA. Data. Statistics 104

Tables Table A Table B Table C Table D Table E 675

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Lecture 5: ANOVA and Correlation

Mathematical Notation Math Introduction to Applied Statistics

Announcements. Statistics 104 (Mine C etinkaya-rundel) Diamonds. From last time.. = 22

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Two-Sample Inference for Proportions and Inference for Linear Regression

STATISTICS 141 Final Review

Lecture 19: Inference for SLR & Transformations

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Occupy movement - Duke edition. Lecture 14: Large sample inference for proportions. Exploratory analysis. Another poll on the movement

Can you tell the relationship between students SAT scores and their college grades?

Unit4: Inferencefornumericaldata. 1. Inference using the t-distribution. Sta Fall Duke University, Department of Statistical Science

SMAM 314 Practice Final Examination Winter 2003

Lecture 30. DATA 8 Summer Regression Inference

Business Statistics. Lecture 10: Course Review

Conditions for Regression Inference:

Difference in two or more average scores in different groups

Visual interpretation with normal approximation

Harvard University. Rigorous Research in Engineering Education

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Introduction to Business Statistics QM 220 Chapter 12

Confidence Intervals, Testing and ANOVA Summary

GPCO 453: Quantitative Methods I Sec 09: More on Hypothesis Testing

Tests of Linear Restrictions

Sociology 6Z03 Review II

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Statistics and Quantitative Analysis U4320

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Review of Statistics 101

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

School of Mathematical Sciences. Question 1

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Inferences for Regression

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Lecture 15 - ANOVA cont.

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics.

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

Lecture 6 Multiple Linear Regression, cont.

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Table 1: Fish Biomass data set on 26 streams

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Six Sigma Black Belt Study Guides

Ch 2: Simple Linear Regression

Wolf River. Lecture 19 - ANOVA. Exploratory analysis. Wolf River - Data. Sta 111. June 11, 2014

Chapters 4-6: Inference with two samples Read sections 4.2.5, 5.2, 5.3, 6.2

Bootstrapping, Permutations, and Monte Carlo Testing

Simple Linear Regression: One Quantitative IV

Example - Alfalfa (11.6.1) Lecture 14 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Inference for Regression

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Example - Alfalfa (11.6.1) Lecture 16 - ANOVA cont. Alfalfa Hypotheses. Treatment Effect

Simple Linear Regression: One Qualitative IV

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Course Review. Kin 304W Week 14: April 9, 2013

ANOVA: Analysis of Variation

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

1 Independent Practice: Hypothesis tests for one parameter:

Chapter 22. Comparing Two Proportions. Bin Zou STAT 141 University of Alberta Winter / 15

Section 4.6 Simple Linear Regression

ST430 Exam 2 Solutions

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Lecture 11: Simple Linear Regression

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 16: Understanding Relationships Numerical Data

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Lecture 15. Hypothesis testing in the linear model

Contents. Acknowledgments. xix

Subject CS1 Actuarial Statistics 1 Core Principles

10: Crosstabs & Independent Proportions

Variance Decomposition and Goodness of Fit

Transcription:

Housekeeping Annoucements MT2 - Review Statistics 101 Dr. Çetinkaya-Rundel November 4, 2014 Peer evals for projects by Thursday - Qualtrics email will come later this evening Additional MT review session this evening 7-8pm at Physics 130 OH before MT: Derek - Tuesday: 4-6pm Xinyi - Tuesday: 7-9pm Dr. C-R - Wednesday: 12:30-2:30pm Michael - Wednesday: 5-7pm SEC both days: 4-9pm Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 2 / 22 Map of concepts Map of concepts inference HT CI one variable 2 levels 3+ levels H0: mu = mu_0 H0: = med_0 randomization H0: p = p_0 H0: p follows hypothesized distribution S-F fail -> randomization E >= 5 -> chi-sq GOF E < 5 -> randomization theoretical Z, T, F, chi-sq Z, T HT x: 2 levels H0: mu_1 = mu_2 simulation randomization bootstrap two variables y = x = x: 3+ levels x&y: 2 levels H0: med_1 = med_2 randomization H0: All mu_i are equal H0: p_1 = p_2 ANOVA, F H0: All med_i are equal randomization S-F fail -> randomization y = x = x/y: 3+ levels H0: x and y are independent E >= 5 -> chi-sq independence E < 5 -> randomization Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 3 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 4 / 22

Map of concepts one variable 2 levels mu 3+ levels N/A p bootstrap S-F fail -> bootstrap Which of the following is true? Modeling ( response) CI two variables y = x = y = x = x: 2 levels x: 3+ levels x&y: 2 levels mu_1 - mu_2 med_1 - med_2 N/A N/A p_1 - p_2 bootstrap S-F fail -> bootstrap (a) If the sample size is large enough, conclusions can be generalized to the population. (b) If subjects are randomly assigned to treatments, conclusions can be generalized to the population. (c) Blocking in experiments serves a similar purpose as stratifying in observational studies. (d) Representative samples allow us to make causal conclusions. (e) Statistical inference requires normal distribution of the response variable. x/y: 3+ levels N/A Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 5 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 6 / 22 Which of the following is the best visualization for evaluating the relationship between two variables? Modeling ( response) Two students in an introductory statistics class choose to conduct similar studies estimating the of smokers at their school. Student A collects data from 100 students, and student B collects data from 50 students. How will the standard errors used by the two students compare? Assume both are simple random samples. Modeling ( response) (a) side-by-side box plots (b) mosaic plot (c) pie chart (d) segmented frequency bar plot (e) relative frequency histogram (a) SE used by Student A < SE used as Student B. (b) SE used by Student A > SE used as Student B. (c) SE used by Student A = SE used as Student B. (d) SE used by Student A SE used as Student B. (e) Cannot tell without knowing the true of smokers at this school. Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 7 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 8 / 22

Which of the following is the best method for evaluating the relationship between two variables? data Inference Which of the following is the best method for evaluating the relationship between a and a variable with many levels? data Inference Modeling ( response) Modeling ( response) (a) chi-square test of independence (b) chi-square test of goodness of fit (c) anova (d) linear regression (e) t-test (a) z-test (b) chi-square test of goodness of fit (c) anova (d) linear regression (e) t-test Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 9 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 10 / 22 Data are collected at a bank on 6 tellers randomly sampled transactions. Do average transaction times vary by teller? Modeling ( response) Response variable:, Explanatory variable: ANOVA Summary statistics: n_1 = 14, _1 = 65.7857, sd_1 = 15.2249 n_2 = 23, _2 = 79.9174, sd_2 = 23.284 n_3 = 15, _3 = 82.66, sd_3 = 18.1842 n_4 = 15, _4 = 77.9933, sd_4 = 23.2754 n_5 = 44, _5 = 81.7295, sd_5 = 21.5768 n_6 = 29, _6 = 75.3069, sd_6 = 20.4814 H_0: All s are equal. H_A: At least one is different. Analysis of Variance Table 20 40 60 80 100 120 1 2 3 4 5 6 1.508 Response: data Df Sum Sq Mean Sq F value Pr(>F) group 5 3315 663.06 1.508 0.1914 Residuals 134 58919 439.69 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 11 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 12 / 22

Application exercise: ANOVA - 5.4 Data are collected on download times at three different times during the day. We want to evaluate whether average download times vary by time of day. Fill in the??s in the ANOVA output below. data Inference Modeling ( response) Response variable:, Explanatory variable: Summary statistics: n_early (7AM) = 16, _Early (7AM) = 113.375, sd_early (7AM) = 47.6541 n_eve (5 PM) = 16, _Eve (5 PM) = 273.3125, sd_eve (5 PM) = 52.1929 n_late (12 AM) = 16, _Late (12 AM) = 193.0625, sd_late (12 AM) = 40.9023 Analysis of Variance Table Response: data Df Sum Sq Mean Sq F value Pr(>F) group???????? 1.306e-11 Residuals?? 100020?? Total?? 304661 What is the result of the ANOVA? 100 150 200 250 300 350 Early (7AM) Evening (5 PM) Late Night (12 AM) Since 1.306e-11 < 0.05, we reject the null hypothesis. The data provide convincing evidence that the average download time is different for at least one pair of times of day. 46.0349 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 13 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 14 / 22 Application exercise: ANOVA (cont.) - 5.5 The next step is to evaluate the pairwise tests. There are 3 pairs of times of day 1 Early vs. Evening: left side of class (facing the board) 2 Evening vs. Late Night: center of class 3 Early vs. Late Night: right side of class Determine the appropriate significance level for these tests, and then complete the test assigned to your team. α = 0.05/3 = 0.0167 (1) Early vs. Evening T 45 = 113.375 273.3125 2223 16 + 2223 16 = 159.9375 = 9.59 16.67 p val < 0.01 (2) Evening vs. Late Night T 45 = 113.375 193.0625 2223 16 + 2223 16 = 79.6875 = 4.78 16.67 p val < 0.01 (3) Early vs. Late Night T 45 = 273.3125 193.0625 2223 16 + 2223 16 = 80.25 16.67 = 4.81 p val < 0.01 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 15 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 16 / 22

What percent of variability in download times is explained by time of day? Modeling ( response) n = 50 and ˆp = 0.80. Hypotheses: H 0 : p = 0.82; H A : p 0.82. We use a randomization test because the sample size isn t large enough for ˆp to be distributed nearly normally (50 0.82 = 41 < 10; 50 0.18 = 9 < 10). Which of the following is the correct set up for this hypothesis test? Red: success, blue: failure, ˆp sim = of reds in simulated samples. Modeling ( response) Response: data Df Sum Sq Mean Sq F value Pr(>F) group 2 204641 102320 46.035 1.306e-11 Residuals 45 100020 2223 204641 (a) 204641+100020 = 0.67 (b) 204641 100020 (c) 100020 (d) 204641 102320 102320+2223 (a) Place 80 red and 20 blue chips in a bag. Sample, with replacement, 50 chips and calculate the of reds. Repeat this many times and calculate the of simulations where ˆp sim 0.82. (b) Place 82 red and 18 blue chips in a bag. Sample, without replacement, 50 chips and calculate the of reds. Repeat this many times and calculate the of simulations where ˆp sim 0.80. (c) Place 82 red and 18 blue chips in a bag. Sample, with replacement, 50 chips and calculate the of reds. Repeat this many times and calculate the of simulations where ˆp sim 0.80 or ˆp sim 0.84. (d) Place 82 red and 18 blue chips in a bag. Sample, with replacement, 100 chips and calculate the of reds. Repeat this many times and calculate the of simulations where ˆp sim 0.80 or ˆp sim 0.84. Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 17 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 18 / 22 Randomization distribution What is / should be the center of the randomization distribution? What is the result of the hypothesis test? observed 0.8 0.70 0.75 0.80 0.85 0.90 randomization statistic Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 19 / 22 Summary of methods Inference for data: One : Parameter of interest: µ n 30 Z, T, n < 30 T One vs. one (with 2 levels): Parameter of interest: µ 1 µ 2 n 1 and n 2 30 Z, T, n 1 or n 2 < 30 T If samples are dependent (paired), first find differences between paired observations One vs. one (with 3+ levels) - : Parameter of interest: NA ANOVA HT only For all other parameters of interest: simulation Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 20 / 22

Summary of methods Inference for data - binary outcome: Summary of methods Inference for data - 3+ outcomes: Binary outcome: One : Parameter of interest: p S/F condition met Z, if not simulation One vs. one, each with only 2 outcomes: Parameter of interest: p 1 p 2 S/F condition met Z, if not simulation S/F: use obs. S and F for CIs and exp. for HT 3+ outcomes: One, compared to hypothetical distribution: Parameter of interest: NA At least 5 exp. successes in each cell χ 2 GOF, if not simulation HT only One vs. one, either with 3+ outcomes: Parameter of interest: NA At least 5 exp. successes in each cell χ 2 Independence, if not simulation HT only Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 21 / 22 Statistics 101 (Dr. Çetinkaya-Rundel) MT2 - Review November 4, 2014 22 / 22