Lecture 41 Sections Mon, Apr 7, 2008

Similar documents
Lecture 41 Sections Wed, Nov 12, 2008

STP 226 ELEMENTARY STATISTICS NOTES

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Quantitative Analysis and Empirical Methods

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Chi-Square. Heibatollah Baghi, and Mastee Badii

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

POLI 443 Applied Political Research

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Lecture 28 Chi-Square Analysis

Psych Jan. 5, 2005

Example. χ 2 = Continued on the next page. All cells

Psych 230. Psychological Measurement and Statistics

Chi Square Analysis M&M Statistics. Name Period Date

Chi square test of independence

Chapter 26: Comparing Counts (Chi Square)

Paired Samples. Lecture 37 Sections 11.1, 11.2, Robb T. Koether. Hampden-Sydney College. Mon, Apr 2, 2012

Chi square test of independence

15: CHI SQUARED TESTS

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Sampling Distribution of a Sample Proportion

Statistics for Managers Using Microsoft Excel

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Sampling Distribution of a Sample Proportion

Discrete Multivariate Statistics

Lecture 45 Sections Wed, Nov 19, 2008

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

Confidence Intervals for Proportions Sections 22.2, 22.3

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

and the Sample Mean Random Sample

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Independent Samples: Comparing Means

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Lecture 21 Comparing Counts - Chi-square test

Relate Attributes and Counts

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: χ = npq

REVIEW 8/2/2017 陈芳华东师大英语系

Unit 27 One-Way Analysis of Variance

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

10: Crosstabs & Independent Proportions

Hypothesis Testing. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul OCTOBER 11, 2016

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

THE PEARSON CORRELATION COEFFICIENT

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

Variables. Lecture 12 Sections Tue, Feb 3, Hampden-Sydney College. Displaying Distributions - Qualitative.

Maximums and Minimums

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence

Chi-squared (χ 2 ) (1.10.5) and F-tests (9.5.2) for the variance of a normal distribution ( )

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

AP Statistics Cumulative AP Exam Study Guide

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Chapter 10: Chi-Square and F Distributions

The Pairwise-Comparison Method

Frequency Distribution Cross-Tabulation

Statistics 3858 : Contingency Tables

Module 10: Analysis of Categorical Data Statistics (OA3102)

Elementary Logic and Proof

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Module 4 Practice problem and Homework answers

Practice Questions: Statistics W1111, Fall Solutions

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Strong Mathematical Induction

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Lecture 26 Section 8.4. Wed, Oct 14, 2009

10.2: The Chi Square Test for Goodness of Fit

Statistics Introductory Correlation

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Introduction to Machine Learning CMU-10701

DISTRIBUTIONS USED IN STATISTICAL WORK

Categorical Data Analysis. The data are often just counts of how many things each category has.

Basic Business Statistics, 10/e

STAT 705: Analysis of Contingency Tables

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Weldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Tables Table A Table B Table C Table D Table E 675

Hypothesis testing. Data to decisions

Data Analysis: Agonistic Display in Betta splendens I. Betta splendens Research: Parametric or Non-parametric Data?

Sampling Distributions: Central Limit Theorem

The Chi-Square Distributions

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Direct Proof Divisibility

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

Assignment 5 SOLUTIONS. Part A Getting a 3 or less on a single roll of a 10-Sided Die. 2. Printout of the first 50 lines of your four data columns.

BIOS 625 Fall 2015 Homework Set 3 Solutions

Topic 21 Goodness of Fit

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Ch 6: Multicategory Logit Models

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Transcription:

Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008

Outline 1 2 3 4 5

one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories, e.g., yes and no. What if their are several categories, e.g., excellent, good, fair, poor, and not sure? Or the categories may be Republican, Democrat, and Independent. Depending on whether the data are univariate (one variable) or bivariate (two variables), we will perform a goodness-of-fit test or a test of homogeneity.

Definition (Count data) Data that counts the number of observations that fall into each of several categories. data may be univariate or bivariate. Univariate example - Observe a person s opinion of the mayor s performance (excellent, good, fair, poor, not sure). Bivariate example - Observe a opinion of the mayor s performance and their political affiliation (Republican, Democrat, Independent).

Univariate Example Observe a person s opinion of the mayor s performance. Excellent Good Fair Poor Not Sure 72 124 123 150 30

Bivariate Example Observe each person s opinion and political affiliation. Excellent Good Fair Poor Not Sure Dem 38 70 59 86 16 Rep 21 31 33 23 7 Ind 13 23 31 41 7

Two Basic Questions Basic Question for Univariate Data Do the data fit a specified distribution? For example, could these data have come from a uniform distribution? Basic Question for Bivariate Data Could the data in every row have come from the same distribution, whatever that distribution may be?

Observed and Expected Counts Definition (Observed Counts) counts that were actually observed in the sample. Definition (Expected Counts) counts that would be expected if the null hypothesis were true.

Tests of Definition (Goodness-of-fit Test) A hypothesis test that determines whether a set of data reasonably fits a specified distribution. goodness-of-fit test applies only to univariate data. null hypothesis specifies a discrete distribution for the population. We want to determine whether a sample from that population supports this hypothesis.

Example If we rolled a die 60 times, we expect 10 of each number. If we get frequencies 8, 10, 14, 12, 9, 7, does that indicate that the die is not fair? What is the distribution if the die were fair?

Example If we selected 20 people from a group that was 60% male and 40% female, we would expect to get 12 males and 8 females. If we got 15 males and 5 females, would that indicate that our selection procedure was not random (i.e., discriminatory)? What if we selected 100 people from the group and got 75 males and 25 females?

Example If we deal a 5-card poker hand, we should get One pair 42.26% of the time, Two pairs 9.51% of the time, Three of a kind 2.11% of the time, and Something else 46.12% of the time. Suppose we deal 1000 poker hands and get one pair 427 times, two pairs 102 times, three of a kind 26 times, and something else 445 times. Does this agree with the theory?

Hypotheses null hypothesis specifies the probability (or proportion) for each category. To test a die for fairness, the null hypothesis would be H 0 : p 1 = 1 6, p 2 = 1 6,..., p 6 = 1 6. alternative hypothesis will always be a simple negation of H 0 : H 1 : H 0 is false. Let α = 0.05.

Expected Counts test statistic will involve the observed and the expected counts. To find the expected counts, we apply the hypothetical probabilities to the sample size. For example, if the hypothetical probabilities are all 1 6 and the sample size is 60, then the expected counts are ( ) 1 60 = 10. 6

Test Make a chart showing both the observed and expected counts (in parentheses). Value 1 2 3 4 5 6 Observed 8 10 14 12 9 7 (Expected) (10) (10) (10) (10) (10) (10) Denote the observed counts by O and the expected counts by E. Define the chi-square (χ 2 ) statistic to be χ 2 = all cells (O E) 2. E

Value of the Test Clearly, if all of the deviations O E are small, then χ 2 will be small. But if even a few the deviations O E are large, then χ 2 will be large. Now calculate χ 2. χ 2 = (8 10)2 (10 10)2 (14 10)2 + + 10 10 10 (12 10)2 (9 10)2 (7 10)2 + + + 10 10 10 = 0.4 + 0.0 + 1.6 + 0.4 + 0.1 + 0.9 = 3.4

Compute the p-value To compute the p-value of the test statistic, we need to know more about the distribution of χ 2. How likely is it that we would observe a value as large as 3.4?

Degrees of Freedom χ 2 distribution has an associated degrees of freedom, just like the t distribution. Each χ 2 distribution has a slightly different shape, depending on the number of degrees of freedom. In this test, df is one less than the number of cells.

Degrees of Freedom Graph of χ 2 1 (df = 1) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Degrees of Freedom Graph of χ 2 2 (df = 2) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Degrees of Freedom Graph of χ 2 3 (df = 3) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Degrees of Freedom Graph of χ 2 4 (df = 4) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Degrees of Freedom Graph of χ 2 5 (df = 5) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Degrees of Freedom Graph of χ 2 6 (df = 6) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Degrees of Freedom Graph of χ 2 7 (df = 7) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Degrees of Freedom Graph of χ 2 8 (df = 8) 0.25 0.2 0.15 0.1 0.05 2.5 5 7.5 10 12.5 15

Properties of χ 2 chi-square distribution with df degrees of freedom has the following properties. χ 2 0. It is unimodal. It is skewed right (not symmetric!) µ χ 2 = df. σ χ 2 = 2df. If df is large, then χ 2 df is approximately normal with mean df and standard deviation 2df.

vs. Normal graph of χ 2 8 vs. N(8, 4) 0.12 0.1 0.08 0.06 0.04 0.02-5 5 10 15 20

vs. Normal graph of χ 2 32 vs. N(32, 8) 0.06 0.05 0.04 0.03 0.02 0.01 10 20 30 40 50 60

vs. Normal graph of χ 2 128 vs. N(128, 16) 0.03 0.025 0.02 0.015 0.01 0.005 80 100 120 140 160 180

vs. Normal graph of χ 2 512 vs. N(512, 32) 0.014 0.012 0.01 0.008 0.006 0.004 0.002 450 500 550 600

goodness-of-fit test will determine whether the data fit a specified distribution. chi-square statistic is defined as χ 2 = all cells (O E) 2. E closer the data match the distribution, the smaller the value of χ 2.