Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Similar documents
Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Introduction to Business Statistics QM 220 Chapter 12

Statistics For Economics & Business

Statistics for Managers Using Microsoft Excel

Basic Business Statistics, 10/e

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Analysis of Variance (ANOVA) one way

WELCOME! Lecture 13 Thommy Perlinger

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Unit 27 One-Way Analysis of Variance

Quantitative Analysis and Empirical Methods

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Psych 230. Psychological Measurement and Statistics

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Lecture 5: ANOVA and Correlation

Chapter 10: Analysis of variance (ANOVA)

Chapter 10: Chi-Square and F Distributions

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Lecture 28 Chi-Square Analysis

Analysis of Variance and Co-variance. By Manza Ramesh

11-2 Multinomial Experiment

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Example. χ 2 = Continued on the next page. All cells

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

The t-statistic. Student s t Test

10/31/2012. One-Way ANOVA F-test

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

STAT Chapter 10: Analysis of Variance

Statistics and Quantitative Analysis U4320

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

One-way ANOVA. Experimental Design. One-way ANOVA

HYPOTHESIS TESTING. Hypothesis Testing

Analysis of Variance: Part 1

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Lecture 7: Hypothesis Testing and ANOVA

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Independent Samples ANOVA

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Analysis of Variance

4.1. Introduction: Comparing Means

Chapter 9 Inferences from Two Samples

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

15: CHI SQUARED TESTS

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Analysis of Variance ANOVA. What We Will Cover in This Section. Situation

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

OHSU OGI Class ECE-580-DOE :Design of Experiments Steve Brainerd

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

Analysis of variance

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

What is a Hypothesis?

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Statistics Handbook. All statistical tables were computed by the author.

One-Way Analysis of Variance (ANOVA)

Chapter 15: Analysis of Variance

Factorial Analysis of Variance

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

In ANOVA the response variable is numerical and the explanatory variables are categorical.

Hypothesis testing. Data to decisions

8/23/2018. One-Way ANOVA F-test. 1. Situation/hypotheses. 2. Test statistic. 3.Distribution. 4. Assumptions

Pooled Variance t Test

The Difference in Proportions Test

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Chapter Seven: Multi-Sample Methods 1/52

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Introduction to Analysis of Variance. Chapter 11

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

This gives us an upper and lower bound that capture our population mean.

ANOVA - analysis of variance - used to compare the means of several populations.

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Design of Experiments. Factorial experiments require a lot of resources

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

Transcription:

Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal

Chapter Goals After completing this chapter, you should be able to: Set up a contingency analysis table and perform a chisquare test of independence Recognize situations in which to use analysis of variance Perform a single-factor hypothesis test and interpret results

Contingency Tables Contingency Tables Situations involving multiple population proportions Used to classify sample observations according to two or more characteristics Also called a crosstabulation table. Male Female Smoke 150 0 170 Don t Smoke 70 160 30 0 180 400

Contingency Tables It can be of any size: x3, 3x, 3x3, or 4x The first digit refers to the number of rows, and the second refers to the number of columns. In general, R C table contains R rows and C columns. The number of cells in a contingency table is obtained by multiplying the number of rows by the number of columns.

Contingency Tables We may want to know if there is an association between being a male or female and smoking or not. The null hypothesis that the two characteristics of the elements of a given population are not related (independent) Against the alternative hypothesis that the two characteristics are related (dependent). Degrees of freedom for the test are: df = (R-1)(C-1)

Contingency Table Example Left-Handed vs. Gender Dominant Hand: Left vs. Right Gender: Male vs. Female H 0 : Hand preference is independent of gender H A : Hand preference is not independent of gender

Contingency Table Example Sample results organized in a contingency table: (continued) sample size = n = 300: 10 Females, 1 were left handed 180 Males, 4 were left handed Hand Preference Gender Left Right Female 1 108 10 Male 4 156 180 36 64 300

Logic of the Test H 0 : Hand preference is independent of gender H A : Hand preference is not independent of gender If H 0 is true, then the proportion of left-handed females should be the same as the proportion of left-handed males The two proportions above should be the same as the proportion of left-handed people overall

Finding Expected Frequencies 10 Females, 1 were left handed 180 Males, 4 were left handed Overall: P(Left Handed) = 36/300 =.1 If independent, then P(Left Handed Female) = P(Left Handed Male) =.1 So we would expect 1% of the 10 females and 1% of the 180 males to be left handed i.e., we would expect (10)(.1) = 14.4 females to be left handed (180)(.1) = 1.6 males to be left handed

Expected Cell Frequencies Expected cell frequencies: (continued) e ij (i th th Row total)(j Columntotal) Total samplesize Example: e 11 (10)(36) 300 14.4

Observed vs. Expected Frequencies Observed frequencies vs. expected frequencies: Gender Female Hand Preference Left Right Observed = 1 Observed = 108 Expected = 14.4 Expected = 105.6 10 Male Observed = 4 Expected = 1.6 Observed = 156 Expected = 158.4 180 36 64 300

The Chi-Square Test Statistic The Chi-square contingency test statistic is: r i1 c j1 (o ij e e ij ij ) with d.f. (r 1)(c 1) where: o ij = observed frequency in cell (i, j) e ij = expected frequency in cell (i, j) r = number of rows c = number of columns

Observed vs. Expected Frequencies Gender Female Male Hand Preference Left Right Observed = 1 Observed = 108 Expected = 14.4 Expected = 105.6 Observed = 4 Observed = 156 Expected = 1.6 Expected = 158.4 10 180 36 64 300 (1 14.4) 14.4 (108105.6) 105.6 (4 1.6) 1.6 (156158.4) 158.4 0.6848

Contingency Analysis 0.6848 with d.f. (r -1)(c -1) (1)(1) 1 Decision Rule: If > 3.841, reject H 0, otherwise, do not reject H 0 = 0.05.05 = 3.841 Do not reject H 0 Reject H 0 Here, = 0.6848 < 3.841, so we do not reject H 0 and conclude that gender and hand preference are independent

Problem A random sample of 300 adults was selected, and they were asked if they favor giving more freedom to schoolteachers to punish students for violence and lack of discipline. Based on the results of the survey, the two-way classification of the responses of these adults is presented in the following table In Favor (F) Against (A) No Opinion (N) Men (M) 93 70 1 Women (W) 87 3 6

Step 1. State the null and alternative hypotheses Ho: the two attributes are independent. H 1 : these attributes are dependent. Step. Select the distribution to use. We use the chi-square distribution to make a test of independence for a contingency table.

Step 3. Determine the rejection and nonrejection regions. The area of the rejection region is.01, and it falls in the right tail of the chi-square distribution curve. The degrees of freedom are df = (-1)(3-1) =

e ij (i th th Row total)(j Columntotal) Total samplesize In Favor (F) Against (A) No Opinion(N) Row total Men (M) 93 70 1 175 105 59.5 10.5 Women (W) 87 3 6 15 75 4.5 7.5 Column Total 180 10 18 300

Step 4. Calculate the value of the test statistic. Step 5. Make a decision. The value of the test statistic χ = 8.5 is less than the critical value of χ = 9.10, and it falls in the nonrejection region. So, we fail to reject Ho

Chapter Overview Analysis of Variance (ANOVA) One-Way ANOVA F-test Tukey- Kramer test Randomized Complete Block ANOVA F-test Fisher s Least Significant Difference test Two-factor ANOVA with replication

General ANOVA Setting Investigator controls one or more independent variables Called factors (or treatment variables) Each factor contains two or more levels (or categories/classifications) Observe effects on dependent variable Response to levels of independent variable Experimental design: the plan used to test hypothesis

One-Way Analysis of Variance Evaluate the difference among the means of three or more populations Examples: Accident rates for 1 st, nd, and 3 rd shift Expected mileage for five brands of tires Assumptions Populations are normally distributed Populations have equal variances Samples are randomly and independently drawn

Completely Randomized Design Experimental units (subjects) are assigned randomly to treatments Only one factor or independent variable With two or more treatment levels Analyzed by One-factor analysis of variance (one-way ANOVA) Called a Balanced Design if all factor levels have equal sample size

Hypotheses of One-Way ANOVA H0 : μ1 μ μ3 μk H A All population means are equal i.e., no treatment effect (no variation in means among groups) :Not allof the populationmeansare the same At least one population mean is different i.e., there is a treatment effect Does not mean that all population means are different (some pairs may be the same)

One-Factor ANOVA H H 0 : μ1 μ μ3 μk A :Not allμ i are the same All Means are the same: The Null Hypothesis is True (No Treatment Effect) μ 1 μ μ3

One-Factor ANOVA H H 0 : μ1 μ μ3 μk A :Not allμ i are the same (continued) At least one mean is different: The Null Hypothesis is NOT true (Treatment Effect is present) or μ 1 μ μ3 μ1 μ μ3

Partitioning the Variation Total variation can be split into two parts: SST = SSB + SSW SST = Total Sum of Squares SSB = Sum of Squares Between SSW = Sum of Squares Within

Partitioning the Variation SST = SSB + SSW (continued) Total Variation (SST) = the aggregate dispersion of the individual data values across the various factor levels Between-Sample Variation (SSB) = dispersion among the factor sample means Within-Sample Variation (SSW) = dispersion that exists among the data values within a particular factor level

Partition of Total Variation Total Variation (SST) Variation Due to Factor (SSB) = + Commonly referred to as: Sum of Squares Between Sum of Squares Among Sum of Squares Explained Among Groups Variation Variation Due to Random Sampling (SSW) Commonly referred to as: Sum of Squares Within Sum of Squares Error Sum of Squares Unexplained Within Groups Variation

Total Sum of Squares SST = SSB + SSW Where: SST k n i1 j1 i (x ij x) SST = Total sum of squares k = number of populations (levels or treatments) n i = sample size from population i x ij = j th measurement from population i x = grand mean (mean of all data values)

Total Variation (continued) SST (x 11 x) (x 1 x)... (x kn k x) Response, X x Group 1 Group Group 3

Sum of Squares Between SST = SSB + SSW Where: SSB k n i(x i1 x) SSB = Sum of squares between k = number of populations n i = sample size from population i x i = sample mean from population i x = grand mean (mean of all data values) i

Between-Group Variation SSB k n i(x i1 Variation Due to Differences Among Groups i x) SSB MSB k 1 Mean Square Between = SSB/degrees of freedom i j

Between-Group Variation (continued) SSB n (x 1 1 x) n (x x)... n k (x k x) Response, X x x1 x 3 x Group 1 Group Group 3

Sum of Squares Within SST = SSB + SSW Where: SSW k i1 n j j1 (x SSW = Sum of squares within k = number of populations n i = sample size from population i x i = sample mean from population i x ij = j th measurement from population i ij x i )

Within-Group Variation SSW k i1 j1 Summing the variation within each group and then adding over all groups n j (x ij x i ) MSW SSW n k T Mean Square Within = SSW/degrees of freedom i

Within-Group Variation (continued) SSW (x 11 x 1 ) (x 1 x )... (x kn k x k ) Response, X x 3 x 1 x Group 1 Group Group 3

One-Way ANOVA Table Source of Variation SS df MS F ratio Between Samples Within Samples SSB SSB k - 1 MSB = k - 1 SSW n T - k MSW = Total SST = n T - 1 SSB+SSW SSW n T - k F = MSB MSW k = number of populations n T = sum of the sample sizes from all populations df = degrees of freedom

One-Factor ANOVA F Test Statistic H 0 : μ 1 = μ = = μ k H A : At least two population means are different Test statistic F MSB MSW Degrees of freedom MSB is mean squares between variances MSW is mean squares within variances df 1 = k 1 (k = number of populations) df = n T k (n T = sum of sample sizes from all populations)

Interpreting One-Factor ANOVA F Statistic The F statistic is the ratio of the between estimate of variance and the within estimate of variance The ratio must always be positive df 1 = k -1 will typically be small df = n T - k will typically be large The ratio should be close to 1 if H 0 : μ 1 = μ = = μ k is true The ratio will be larger than 1 if H 0 : μ 1 = μ = = μ k is false

One-Factor ANOVA F Test Example You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the.05 significance level, is there a difference in mean distance? Club 1 Club Club 3 54 34 00 63 18 41 35 197 37 7 06 51 16 04

One-Factor ANOVA Example: Scatter Diagram Club 1 Club Club 3 54 34 00 63 18 41 35 197 37 7 06 51 16 04 x1 49. x 6.0 x3 x 7.0 05.8 Distance 70 60 50 40 30 0 10 00 190 x 1 x x 3 x 1 3 Club

One-Factor ANOVA Example Computations Club 1 Club Club 3 54 34 00 63 18 41 35 197 37 7 06 51 16 04 x 1 = 49. x = 6.0 x 3 = 05.8 x = 7.0 n 1 = 5 n = 5 n 3 = 5 n T = 15 k = 3 SSB = 5 [ (49. 7) + (6 7) + (05.8 7) ] = 4716.4 SSW = (54 49.) + (63 49.) + + (04 05.8) = 1119.6 MSB = 4716.4 / (3-1) = 358. MSW = 1119.6 / (15-3) = 93.3 F 358. 93.3 5.75

One-Factor ANOVA Example Solution 0 H 0 : μ 1 = μ = μ 3 H A : μ i not all equal =.05 df 1 = df = 1 Do not reject H 0 Critical Value: F = 3.885 =.05 Reject H 0 F.05 = 3.885 F F = 5.75 Test Statistic: MSB MSW Decision: Reject H 0 at = 0.05 Conclusion: 358. 93.3 5.75 There is evidence that at least one μ i differs from the rest

ANOVA -- Single Factor: Excel Output EXCEL: tools data analysis ANOVA: single factor SUMMARY Groups Count Sum Average Variance Club 1 5 146 49. 108. Club 5 1130 6 77.5 Club 3 5 109 05.8 94. ANOVA Source of Variation Between Groups Within Groups SS df MS F P-value F crit 4716.4 358. 5.75 4.99E-05 3.885 1119.6 1 93.3 Total 5836.0 14

Chapter Summary Described one-way analysis of variance The logic of ANOVA ANOVA assumptions F test for difference in k means The Tukey-Kramer procedure for multiple comparisons Described randomized complete block designs F test Fisher s least significant difference test for multiple comparisons Described two-way analysis of variance Examined effects of multiple factors and interaction

Problems Fifteen fourth-grade students were randomly assigned to three groups to experiment with three different methods of teaching arithmetic. At the end of the semester, the same test was given to all 15 students. The table gives the scores of students in the three groups. At the 1% significance level, can we reject the null hypothesis that the mean arithmetic score of all fourthgrade students taught by each of these three methods is the same? Assume that all the assumptions required to apply the one-way ANOVA procedure hold true. Method I Method II Method III 48 55 84 73 85 68 51 70 95 65 69 74 87 90 67

Step 1. State the null and alternative hypotheses Let μ 1, μ, and μ 3 be the mean arithmetic scores of all fourth-grade students who are taught, respectively, by Methods I, II, and III. The null and alternative hypotheses are: Ho: μ 1 = μ = μ 3 Vs. H1: Not all the means are equal Step. Select the distribution to use. Since we are comparing the means for three normally distributed populations, we use the F distribution to make this test.

Step 3. Determine the rejection and nonrejection regions The significance level is.01. ANOVA test is always right-tailed, the area in the right tail of the F distribution curve is.01 Numerator degrees of freedom = k 1 = 3 1 = Denominator degrees of freedom = n k = 15 3 =

Step 4. Calculate the value of the test statistic. T 1 = 48 + 73 + 51 + 65 + 87 = 34 T = 55 + 85 + 70 + 69 + 90 = 369 T 3 = 84 + 68 + 95 + 74 + 67 = 388 x = T 1 + T + T 3 = 1081 n = 5 + 5 + 5 = 15 x = (48) + (73) + (51) + (74) + (67) = 80,709 SSB 34 369 388 1081 5 SSW 80,709 5 5 15 34 369 388 5 5 5 43.1333 37.8000

MSB SSB 43.1333 16.0667 k 1 31 MSW SSW 37.8000 197.7333 n k 15 3 MSB 16.0667 F MSW 197.7333 Step 5. Make a decision The test statistic F = 1.09 is less than the F critical (6.93) We fail to reject the null hypothesis and conclude that the means of the three populations are equal

For convenience, all these calculations are often recorded in a table called the ANOVA table. Anova: Single Factor SUMMARY Groups Count Sum Average Variance Method I 5 34 64.8 58. Method II 5 369 73.8 194.7 Method III 5 388 77.6 140.3 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 43.1333 16.0667 1.093 0.366 6.97 Within Groups 37.8 1 197.7333 Total 804.9333 14

Copyright The materials of this presentation were mostly taken from the PowerPoint files accompanied Business Statistics: A Decision-Making Approach, 7e 008 Prentice-Hall, Inc.