Sociology 6Z03 Review II

Similar documents
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Review of Statistics 101

Mathematical Notation Math Introduction to Applied Statistics

Sociology 6Z03 Review I

Confidence Intervals, Testing and ANOVA Summary

Battery Life. Factory

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Ch. 1: Data and Distributions

ANOVA: Analysis of Variation

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Inferences for Regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Ch 2: Simple Linear Regression

MATH Chapter 21 Notes Two Sample Problems

Mathematical Notation Math Introduction to Applied Statistics

Correlation Analysis

Inference for the Regression Coefficient

AP Statistics Cumulative AP Exam Study Guide

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

16.3 One-Way ANOVA: The Procedure

Mathematics for Economics MA course

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Inference for Regression

Lecture 3: Inference in SLR

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Tables Table A Table B Table C Table D Table E 675

STATISTICS 141 Final Review

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

AMS 7 Correlation and Regression Lecture 8

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Econometrics. 4) Statistical inference

Unit 27 One-Way Analysis of Variance

Multiple comparisons - subsequent inferences for two-way ANOVA

Exam details. Final Review Session. Things to Review

STA 101 Final Review

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Statistics Handbook. All statistical tables were computed by the author.

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Math Review Sheet, Fall 2008

Week 12 Hypothesis Testing, Part II Comparing Two Populations

16.400/453J Human Factors Engineering. Design of Experiments II

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Practice Problems Section Problems

Chapter 23: Inferences About Means

What is a Hypothesis?

Review of Statistics

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Chapter 9 Inferences from Two Samples

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Sociology 6Z03 Topic 10: Probability (Part I)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Sleep data, two drugs Ch13.xls

Formal Statement of Simple Linear Regression Model

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 14 Simple Linear Regression (A)

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 16. Simple Linear Regression and dcorrelation

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Hypothesis Testing hypothesis testing approach

Chapter 3 Multiple Regression Complete Example

STA1000F Summary. Mitch Myburgh MYBMIT001 May 28, Work Unit 1: Introducing Probability

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Statistics for Managers using Microsoft Excel 6 th Edition

Probability and Statistics Notes

Correlation and Regression

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Average weight of Eisenhower dollar: 23 grams

Chapter 1. The data we first collected was the diameter of all the different colored M&Ms we were given. The diameter is in cm.

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Analysis of Variance

Simple Linear Regression

A discussion on multiple regression models

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 4. Regression Models. Learning Objectives

1 Introduction to One-way ANOVA

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Chapter 16. Simple Linear Regression and Correlation

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Concordia University (5+5)Q 1.

Inference for Regression Simple Linear Regression

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

STAT 4385 Topic 01: Introduction & Review

Lecture 10 Multiple Linear Regression

Transcription:

Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability Part II Confidence Intervals Hypothesis Tests Inference for Means Inference for Proportions Inference for Contingency Tables Inference for Regression Analysis One-Way Analysis of Variance John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 2 / 35

Probability (Part I) Probability Basics Experiment Outcomes Sample space Events The axioms of probability theory John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 3 / 35 Probability (Part I) Discrete and Continuous Random Variables Discrete random variables Probability distribution Mean, variance, and standard deviation Continuous random variables Density curves Normal distributions µ = x i p i σ 2 = (x i µ) 2 p i σ = + σ 2 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 4 / 35

Sampling Distributions Statistical Inference Statistical Inference: Drawing conclusions about populations from random samples Characteristics of populations: parameters (Greek letters, e.g, µ, σ) Characteristics of samples: statistics (Roman letters, e.g., x, s) Statistics vary from sample to sample John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 5 / 35 Sampling Distributions Sampling distribution of a statistic Repeated sampling Sampling variability Bias, variance, and mean-square error The sampling distribution of sample means From a normal population: x N(µ, σ/ n) The central limit theorem, almost any population: x N(µ, σ/ n) Using simulation to explore sampling distributions John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 6 / 35

Probability (Part II) Venn diagrams Addition rule for non-disjoint events: P(A or B) = P(A) + P(B) P(A and B) Independent and dependent events. For A and B independent: P(A and B) = P(A)P(B) Conditional probability: P(B A) = P(A and B)/P(A) Tree diagrams John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 7 / 35 Probability (Part II) Binomial Distributions Formula for the binomial distribution with n trials and probability of success p: ( ) n P(X = k) = p k (1 p) n k k Mean: E (X ) = np Variance: V (X ) = np(1 p) Normal approximation to the binomial John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 8 / 35

Confidence Intervals Point estimation vs. interval estimation Confidence intervals: estimate ± margin of error Proper interpretation (e.g., of a 95-percent confidence interval): With repeated sampling, 95 percent of confidence intervals constructed by this method will include the true value of the parameter and 5 percent will miss the true value. (The confidence interval for any particular sample either includes the parameter or misses it.) Confidence interval for the population mean µ: x ± z σ n John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 9 / 35 Confidence Intervals Margin of Error The margin of error z σ/ n gets smaller when the level of confidence C is made smaller the population standard deviation σ gets smaller the sample size n gets larger Choosing the sample size for a desired margin of error m: n = ( z ) σ 2 m John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 10 / 35

Hypothesis Tests The Null and Alternative Hypotheses Null hypothesis H 0 and alternative hypothesis H a Directional (one-sided) alternative hypothesis: H 0 : µ = µ 0 or H a : µ > µ 0 H 0 : µ = µ 0 H a : µ < µ 0 Nondirectional (two-sided) alternative hypothesis: H 0 : µ = µ 0 H a : µ = µ 0 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 11 / 35 Hypothesis Tests Steps in Hypothesis Testing (for Means) 1 State the null and alternative hypotheses 2 From the data, calculate the test statistic z = x µ 0 σ/ n 3 The null sampling distribution of the test statistic is the standard normal distribution, z N(0, 1). 4 Calculate the P-value: the probability of obtaining a sample result (value of the test statistic) at least as extreme as the one observed if H 0 is true. Statistical significance and the significance level α John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 12 / 35

Hypothesis Tests Hypothesis Tests and Confidence Intervals A two-sided test at the level α (e.g.,.05 ) corresponds to a confidence interval with level C = 1 α (e.g.,.95 or 95 percent) Cautions concerning confidence intervals: Data must be a SRS from a large population Beware of outliers and non-normality in small samples The margin of error covers only random sampling errors John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 13 / 35 Hypothesis Tests Hypothesis Testing as Decision Making State of nature Decision H 0 true H 0 false Reject H 0 Type I error Correct decision Accept H 0 Correct decision Type II error Probability of Type I error = α (the level of the test) Power of the test = 1 P(Type II error) The power of the test goes up as the sample size n grows the true value of µ gets farther from the null value µ 0 the α level is made larger John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 14 / 35

Inference for Means Single-Sample t-tests and t-intervals Assumptions for the single-sample t procedures: Data are a SRS. The population is normal with mean µ and standard deviation σ, both of which are unknown. The statistic t = x µ s/ n follows a t-distribution with n 1 degrees of freedom. The standard error of the sample mean is SE = s/ n. To test the hypothesis H 0 : µ = µ 0, calculate the test statistic t = x µ 0 s/ n John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 15 / 35 Inference for Means Single-Sample t-tests and t-intervals To construct a level C confidence interval for µ, find the critical value t from the t-distribution with n 1 degrees of freedom, and with probability (1 C )/2 to the right. Then calculate estimate ± t SE = x ± t s n For matched-pairs data, the single-sample t-test and t-interval procedures are applied to the differences between the pairs. The t procedures are robust with respect to violation of the assumption of normality if the sample size is large. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 16 / 35

Inference for Means Two-Sample t-tests and t-intervals Assumptions for the two-sample t procedures: We have two independent SRSs from two populations. Both populations are normally distributed, with unknown means µ 1 and µ 2 and standard deviations σ 1 and σ 2. The statistic t = (x 1 x 2 ) (µ 1 µ 2 ) s1 2 + s2 2 n 1 n 2 follows a t-distribution with degrees of freedom approximated by the smaller of n 1 1 and n 2 1 (or by a complicated formula). The standard error of the difference in sample means x 1 x 2 is s 2 SE = 1 + s2 2 n 1 n 2 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 17 / 35 Inference for Means Two-Sample t-tests and t-intervals To test the hypothesis H 0 : µ 1 µ 2 = 0, calculate the test statistic t = x 1 x 2 s1 2 + s2 2 n 1 n 2 To construct a level C confidence interval for µ 1 µ 2, find the critical value t from the t-distribution with the smaller of n 1 1 and n 2 1 degrees of freedom, and with probability (1 C )/2 to the right. Then calculate estimate ± t SE = (x 1 x 2 ) ± t s1 2 + s2 2 n 1 n 2 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 18 / 35

Inference for Proportions Single-Sample Tests and Intervals for a Population Proportion p The sample proportion is Assumptions: p = count of successes in the sample n The data are a SRS from the population. For a test, np 0 and n(1 p 0 ) are both at least 10; for a confidence interval, the counts of successes and failures are both at least 15. The statistic z = p p p(1 p) follows an approximate standard normal distribution N(0, 1). John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 19 / 35 n Inference for Proportions Single-Sample Tests and Intervals for a Population Proportion p To test the hypothesis H 0 : p = p 0, calculate the test statistic p p 0 z = p0 (1 p 0 ) n To construct a level C confidence interval for p, find the critical value z from the standard normal distribution with probability (1 C )/2 to the right. Then calculate p(1 p) estimate ± z SE = p ± z n To find the sample size for a desired margin of error, m: ( z ) 2 n = p (1 p ) m p is a guessed value for the population proportion (conservatively,.5). John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 20 / 35

Inference for Proportions Two-Sample Tests and Intervals for a Difference in Population Proportions The difference in population proportions is p 1 p 2. The sample difference in proportions p 1 p 2 is approximately normally distributed with mean p 1 p 2 and standard deviation p 1 (1 p 1 ) + p 2(1 p 2 ) n 1 n 2 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 21 / 35 Inference for Proportions Two-Sample Tests and Intervals for a Difference in Population Proportions To test the hypothesis H 0 : p 1 p 2 = 0 (i.e., H 0 : p 1 = p 2 ), when the counts of successes and failures in both samples are all at least 5: 1 Calculate the pooled sample proportion p = count of successes in both samples combined n 1 + n 2 2 Calculate the test statistic z = p 1 p 2 ( 1 p(1 p) + 1 ) n 1 n 2 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 22 / 35

Inference for Proportions Two-Sample Tests and Intervals for a Difference in Population Proportions To construct a level C confidence interval for p 1 p 2 (when the counts of successess and failures in both sample are all at least 10): 1 Find the critical value z from the standard normal distribution with probability (1 C )/2 to the right 2 Calculate SE = p 1 (1 p 1 ) + p 2(1 p 2 ) n 1 n 2 3 Calculate estimate ± z SE = ( p 1 p 2 ) ± z SE John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 23 / 35 Inference for Contingency Tables The chi-square test for independence is used to test the null hypothesis that two categorical variables are unrelated in the population. Expected counts for each cell of the r c table under the null hypothesis are calculated as The test statistic expected count = X 2 = row total column total n (observed count expected count)2 expected count is approximately distributed as chi-square with (r 1)(c 1) degrees of freedom. For the chi-square test to be accurate: No more than 20 percent of the expected counts should be less than 5, and all of the expected counts should be 1 or larger. The data should be a SRS from the population. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 24 / 35

Inference for Regression Analysis Simple Linear Regression: Statistical Model Recall that the least-squares line ŷ = a + bx in simple regression is given by b = r s y s x a = y bx Inference in simple regression is based on a statistical model, which is assumed to describe the population: 1 Linearity: The average response in the population is µ y = α + βx. 2 Constant spread: The population standard deviation σ of y is the same for all values of x. 3 Normality: For any fixed value of x, the response y follows a normal distribution. 4 Independence: Observations are sampled independently. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 25 / 35 Inference for Regression Analysis Simple Linear Regression: Standard Error of the Regression The standard error about the regression line s estimates σ: s = 1 n 2 residual 2 where the residual is y ŷ. The degrees of freedom for s are n 2. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 26 / 35

Inference for Regression Analysis Simple Linear Regression: Confidence Intervals and Hypothesis Tests for the Slope Confidence intervals and hypothesis tests for the population slope β use the t distribution: The standard error of the sample least-squares slope b is s SE b = (x x) 2 The statistic t = b β SE b follows a t distribution with n 2 degrees of freedom. To test the null hypothesis H 0 : β = 0, calculate the test statistic t = b SE b To construct a confidence interval for β, find the critical value t from the t-distribution with n 2 degrees of freedom. Then calculate estimate ± t SE = b ± t SE b John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 27 / 35 Inference for Regression Analysis Multiple Linear Regression Inference in multiple regression is similar, but based on the model µ y = α + β 1 x 1 + β 2 x 2 + + β k x k The least-squares fit is ŷ = a + b 1 x 1 + b 2 x 2 + + b k x k The standard deviation σ of y around the population regression is estimated by s = 1 n k 1 residual 2 The degrees of freedom for s are n k 1. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 28 / 35

Inference for Regression Analysis Multiple Linear Regression: Confidence Intervals and Hypothesis Tests for a Slope Coefficient The standard errors of individual slope coefficients b 1, b 2,..., b k are found using a computer program. Then, to test, e.g., H 0 : β 1 = 0, we calculate t = b 1 SE b1 which follows a t distribution with n k 1 degrees of freedom under H 0. To construct a confidence interval, e.g., for β 1, find the critical value t from the t-distribution with n k 1 degrees of freedom. Then calculate estimate ± t SE = b 1 ± t SE b1 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 29 / 35 Inference for Regression Analysis Checking the Assumptions of the Regression Model Linearity: Plot the residuals against each x. Constant Spread: Plot the residuals against the fitted values ŷ. Normality: Examine a histogram of the residuals. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 30 / 35

One-Way Analysis of Variance One-way analysis of variance (ANOVA) is used to test the null hypothesis that several population means are equal: against the alternative hypothesis H 0 : µ 1 = µ 2 = = µ I H a : not all of µ 1, µ 2,..., µ I are equal John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 31 / 35 One-Way Analysis of Variance Assumptions 1 We have I independent SRSs, one from each population. 2 Each population is normally distributed, with unknown (and potentially different) means µ i, but the same unknown standard deviation σ. 3 The ANOVA F -test is approximately correct if the largest of the sample standard deviations is no more than twice as large as the smallest of the sample standard deviations. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 32 / 35

One-Way Analysis of Variance Calculating the F Test Statistic 1. Find the sum of squares for groups where n i is the size of the ith sample; x i is the mean for the ith sample; x is the mean for all of the data. The degrees of freedom for SSG is I 1. 2. The mean square for groups is SSG = n i (x i x) 2 MSG = SSG I 1 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 33 / 35 One-Way Analysis of Variance Calculating the F Test Statistic 3. Find the sum of squares for error where s 2 i is the variance in sample i. SSE = (n i 1)s 2 i The sum of squares for error has N I degrees of freedom (careful: not N 1), where N = n i is the total sample size. 4. The mean square for error is MSE = SSE N I 5. The test statistic F = MSG MSE follows the F distribution with I 1 degrees of freedom in the numerator and N I degrees of freedom in the denominator. John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 34 / 35

One-Way Analysis of Variance The ANOVA Table It is convenient to organize the calculation of the one-way analysis of variance F -test in an ANOVA table: Source df SS MS F Groups I 1 n i (x i x) 2 SSG MSG I 1 MSE Error N I (n i 1)si 2 Total N 1 SSG + SSE SSE N I John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 35 / 35