STP 226 ELEMENTARY STATISTICS NOTES

Similar documents
Lecture 41 Sections Mon, Apr 7, 2008

13.1 Categorical Data and the Multinomial Experiment

POLI 443 Applied Political Research

Quantitative Analysis and Empirical Methods

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Topic 21 Goodness of Fit

Chi-Square. Heibatollah Baghi, and Mastee Badii

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Lecture 28 Chi-Square Analysis

15: CHI SQUARED TESTS

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Statistics for Managers Using Microsoft Excel

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Unit 27 One-Way Analysis of Variance

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Tables Table A Table B Table C Table D Table E 675

Psych Jan. 5, 2005

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Module 10: Analysis of Categorical Data Statistics (OA3102)

11-2 Multinomial Experiment

10.2 Hypothesis Testing with Two-Way Tables

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

The Chi-Square Distributions

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

STT 843 Key to Homework 1 Spring 2018

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

Psych 230. Psychological Measurement and Statistics

Basic Business Statistics, 10/e

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Confidence Intervals, Testing and ANOVA Summary

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

POLI 443 Applied Political Research

and the Sample Mean Random Sample

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 41 Sections Wed, Nov 12, 2008

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

Inferential statistics

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Sociology 6Z03 Review II

MA : Introductory Probability

16.3 One-Way ANOVA: The Procedure

Frequency Distribution Cross-Tabulation

The Chi-Square Distributions

THE PEARSON CORRELATION COEFFICIENT

Statistics 3858 : Contingency Tables

AP Statistics Cumulative AP Exam Study Guide

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Ling 289 Contingency Table Statistics

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Chapter Eight: Assessment of Relationships 1/42

For instance, we want to know whether freshmen with parents of BA degree are predicted to get higher GPA than those with parents without BA degree.

Glossary for the Triola Statistics Series

Chapter 10. Discrete Data Analysis

Example. χ 2 = Continued on the next page. All cells

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

8.1-4 Test of Hypotheses Based on a Single Sample

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Is there a connection between gender, maths grade, hair colour and eye colour? Contents

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Exam details. Final Review Session. Things to Review

10: Crosstabs & Independent Proportions

Subject CS1 Actuarial Statistics 1 Core Principles

Discrete Multivariate Statistics

10.2: The Chi Square Test for Goodness of Fit

4/22/2010. Test 3 Review ANOVA

Module 8: Linear Regression. The Applied Research Center

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Chapter 26: Comparing Counts (Chi Square)

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

MATH 240. Chapter 8 Outlines of Hypothesis Tests

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Solutions to Practice Test 2 Math 4753 Summer 2005

Chi square test of independence

Chapter 1 Statistical Inference

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Institute of Actuaries of India

from

Chi square test of independence

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Analysis of Variance: Part 1

Independent Samples ANOVA

DISTRIBUTIONS USED IN STATISTICAL WORK

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

10.4 Hypothesis Testing: Two Independent Samples Proportion

Chi Square Analysis M&M Statistics. Name Period Date

ST 305: Final Exam ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) ( ) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ Y. σ X. σ n.

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

TUTORIAL 8 SOLUTIONS #

Goodness of Fit Tests

Multiple Sample Categorical Data

Transcription:

STP 226 ELEMENTARY STATISTICS NOTES PART 1V INFERENTIAL STATISTICS CHAPTER 12 CHI SQUARE PROCEDURES 12.1 The Chi Square Distribution A variable has a chi square distribution if the shape of its distribution follows the shape of a right skewed curve called the chi square (χ 2 ) curve. Basic Properties of χ 2 Curves 1. The total area under a χ 2 curve is equal to 1. 2. A χ 2 curve starts at 0 on the horizontal axis and extends indefinitely to the right, approaching, but never touching, the horizontal axis as it does so. 3. A χ 2 curve is right skewed. 4. As the number of degrees of freedom becomes larger, χ 2 curves look increasingly like normal curves.

Using the χ 2 Table The table (Table V) gives the area to the right of a χ 2 value. χ α 2 has an area of α to its right for a specified degree of freedom. 12.2 Chi Square Goodness of fit Test Chi square goodness of fit test procedure used to perform a hypothesis test about the distribution of a qualitative (categorical) variable or a discrete quantitative variable having only finitely many possible values. Expected frequency E = np where n is the sample size and p is the relative frequency. For a sample of size n = 500 Type of violent crime Relative Frequency (p) Expected Frequency (E=np) Murder 0.012 500. 0.012 = 6.0 Forcible rape 0.054 500. 0.054 = 27.0 Robbery 0.323 500. 0.323 = 161.5 Agg. Assault 0.61 500. 0.611 = 305.5 Type of Observed Expected Difference Square of Chi square violent crime Frequency Frequency distance subtotal x O E O E (O E) 2 (O E) 2 /E Murder 9 6.0 3.0 9.00 1.500 Forcible rape 26 27.0 1.0 1.00 0.037 Robbery 144 161.5 17.5 306.25 1.896 Agg. assault 321 305.5 15.5 240.25 0.786 SUMS 500 500.0 0 4.219

Distribution of the χ 2 statistic for a chi square goodness of fit Test For a chi square goodness of fit test, the test statistic, χ 2 = Σ(O E) 2 /E has approximately a chi square distribution if the null hypothesis is true. The number of degrees of freedom is one less than the number of possible values for the variable under consideration. The Chi Square Goodness Of Fit Test ASSUMPTIONS 1. All expected frequencies are 1 or greater. 2. At most 20% of the expected frequencies are less than 5. STEPS 1. The null and alternative hypotheses are H 0 : The variable under consideration has the specified distribution. H a : The variable under consideration does not have the specified distribution. 2. Calculate the expected frequency for each possible value of the variable under consideration using the formula E = np, where n is the sample size and p is the relative frequency (or probability) given for the value in the null hypothesis. 3. Check whether the expected frequencies satisfy Assumptions 1 and 2. If they do not, this procedure should not be used. 4. Decide on the significance level, α.

5. The critical value is χ α 2 with df = k 1, where k is the number of possible values for the variable under consideration. Use Table V to find the critical value. 6. Compute the value of the test statistic χ 2 = Σ(O E) 2 /E, where O and E denote observed and expected frequencies, respectively. 7. If the value of the test statistic falls in the rejection, reject H 0 ; otherwise, do not reject H 0. 8. State the conclusion in words. The p value approach will only compare the p value of the test statistic with the α level (the p value of the test) 12.3 Contingency Tables; Association Chi square independence test Chi square procedure Contingency Tables Univariate data data values of one variable of a population Bivariate data data values of two variables of a population Contingency table (two way table) frequency distribution for bivariate data Cell the intersection of a row and a column in a contingency table

Eg. of a two way table Class Level Political Party Freshmen Sophomore Junior Senior Total Democratic 1 4 5 3 13 Republican 4 8 4 2 18 Other 1 3 3 2 9 Total 6 15 12 7 40 Association is when knowing the value of one variable imparts information about the value of the other variable where the two variables are categorical or quantitative with only finitely many possible values. Class Level Political Party Freshmen Sophomore Junior Senior Total Democratic 0.167 0.267 0.417 0.429 0.325 Republican 0.667 0.533 0.333 0.286 0.450 Other 0.167 0.200 0.250 0.286 0.225 Total 1.000 1.000 1.000 1.000 1.000 Conditional distribution the distribution of political party for each of the freshmen, sophomores, juniors or seniors. Freshmen are divided into three classes (class levels) (16.7% Dem., 66.7% Rep., 16.7% Other). This is called the conditional distribution of political party for freshmen. The same applies to the students in the other groups. Marginal Distribution the total column gives the unconditional distribution of political party. (32.5% Dem., 45% Rep., 22.5% Other) Segmented bar graph can be used to view/understand the concept of association First 4 segmented bars give the conditional distribution for political party for freshmen, sophomores, juniors, and seniors.

The last segmented bar gives the marginal distribution of political party. There is an association between two variables of a population if the conditional distributions of one variable given the other are not identical. Statistically dependent there is an association between two variables Statistically independent there is no association between two variables

12.4 Chi square Independence Test Distribution of the χ 2 statistic for a chi square independence test For a chi square independence test, the test statistic 2 = O E 2 E Has approximately a chi square distribution if the null hypothesis of nonassociation is true. The number of degrees of freedom is (r 1)(c 1), where r and c are the number of possible values for the two variables under consideration. The Chi square Independence Test ASSUMPTIONS 1. All expected frequencies are 1 or greater. 2. At most 20% of the expected frequencies are less than 5. STEPS 1. The null and alternate hypothesis are H 0 : The two variables under consideration are not associated. H a : The two variables under consideration are associated. 2. Calculate the expected frequencies using the formula E = R.C n where R = row total, C = column total, and n = sample size. Place each expected frequency below its corresponding observed frequency in the contingency table. 3. Check whether the expected frequencies satisfy assumptions 1 & 2. If they do not, this procedure should not be used. 4. Decide on the significance level, α. 5. The critical value is χ α 2 with df = (r 1)(c 1), where r and c are the number of possible values for the two variables under consideration. Use Table V to find the critical value. 6. Compute the value of the test statistic where O and E represent observed and expected frequencies, respectively.

7. If the value of the test statistic falls in the rejection region, reject H 0 ; otherwise, do not reject H 0. 8. State the conclusion in words.