STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Similar documents
13.1 Categorical Data and the Multinomial Experiment

Module 10: Analysis of Categorical Data Statistics (OA3102)

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

11-2 Multinomial Experiment

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data

15: CHI SQUARED TESTS

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Example. χ 2 = Continued on the next page. All cells

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

STAC51: Categorical data Analysis

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Discrete Multivariate Statistics

Statistics 3858 : Contingency Tables

Testing Independence

10: Crosstabs & Independent Proportions

Describing Contingency tables

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

10.2: The Chi Square Test for Goodness of Fit

Ling 289 Contingency Table Statistics

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Statistics for Managers Using Microsoft Excel

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Chapter 10. Discrete Data Analysis

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Ch. 11 Inference for Distributions of Categorical Data

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Lecture 28 Chi-Square Analysis

Sociology 6Z03 Review II

Lecture 7: Hypothesis Testing and ANOVA

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Inferences for Proportions and Count Data

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Practice Questions: Statistics W1111, Fall Solutions

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Chapter 10: Chi-Square and F Distributions

79 Wyner Math Academy I Spring 2016

Contingency Tables Part One 1

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Discrete Distributions

Chi-square (χ 2 ) Tests

2.3 Analysis of Categorical Data

Mathematical Notation Math Introduction to Applied Statistics

Basic Business Statistics, 10/e

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

STP 226 ELEMENTARY STATISTICS NOTES

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Chapter 9 Inferences from Two Samples

Chi-square (χ 2 ) Tests

Inferential statistics

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

10.2 Hypothesis Testing with Two-Way Tables

Chapter 11. Hypothesis Testing (II)

Chapter 2: Describing Contingency Tables - I

Topic 21 Goodness of Fit

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Chi-Square. Heibatollah Baghi, and Mastee Badii

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Categorical Data Analysis 1

:the actual population proportion are equal to the hypothesized sample proportions 2. H a

POLI 443 Applied Political Research

Lecture 01: Introduction

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Cohen s s Kappa and Log-linear Models

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Ordinal Variables in 2 way Tables

STAT 705: Analysis of Contingency Tables

Goodness of Fit Tests

TUTORIAL 8 SOLUTIONS #

Relate Attributes and Counts

Summary of Chapters 7-9

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

Name: Partners: Statistics. Review 11 Version A

Chapter 26: Comparing Counts (Chi Square)

Inferences About Two Proportions

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Hypothesis Testing: Chi-Square Test 1

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Lecture 8: Summary Measures

Frequency Distribution Cross-Tabulation

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs

Chapter 8: Confidence Intervals

Categorical Data Analysis Chapter 3

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

2.4. Conditional Probability

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Transcription:

STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example 1: Voters are asked which of 6 candidates they prefer. Example 2: Residents are surveyed about which part of Columbia they live in. (Downtown, NW, NE, SW, SE) Multinomial Experiment (Extension of a binomial experiment from 2 to k possible outcomes) (1) Consists of n identical trials (2) There are k possible outcomes (categories) for each trial (3) The probabilities for the k outcomes, denoted p 1, p 2,, p k, are the same for each trial (and p 1 + p 2 + + p k = 1) (4) The trials are independent The cell counts, n 1, n 2,, n k, which are the number of observations falling in each category, are the random variables which follow a multinomial distribution.

Analyzing a One-Way Table Suppose we have a single categorical variable with k categories. The cell counts from a multinomial experiment can be arranged in a one-way table. Example 1: Adults were surveyed about their favorite sport. There were 6 categories. p 1 = proportion of U.S. adults favoring football p 2 = proportion of U.S. adults favoring baseball p 3 = proportion of U.S. adults favoring basketball p 4 = proportion of U.S. adults favoring auto racing p 5 = proportion of U.S. adults favoring golf p 6 = proportion of U.S. adults favoring other It was hypothesized that the true proportions are (p 1, p 2, p 3, p 4, p 5, p 6 ) = (.4,.1,.2,.06,.06,.18). 95 adults were randomly sampled; their preferences are summarized in the one-way table: Favorite Sport Football Baseball Basketball Auto Golf Other n 37 12 17 8 5 16 95 We test our null hypothesis (at =.05) with the following test:

Test for Multinomial Probabilities H 0 : p 1 = p 1,0, p 2 = p 2,0,, p k = p k,0 H a : at least one of the hypothesized probabilities is wrong The test statistic is: where n i is the observed cell count for category i and E(n i ) is the expected cell count for category i if H 0 is true. Rejection region: 2 > 2 where 2 based on k 1 d.f. (large values of 2 => observed n i very different from expected E(n i ) under H 0 ) Assumptions: (1) The data are from a multinomial experiment. (2) Every expected cell count E(n i ) is at least 5. (large-sample test) Finding expected cell counts: Note that E(n i ) = np i,0.

For our data, i n i E(n i ) Test statistic value: From Table VII:

Analyzing a Two-Way Table Now we consider observations that are classified according to two categorical variables. Such data can be presented in a two-way table (contingency table). Example: Suppose the people in the favorite-sport survey had been further classified by gender: Two categorical variables: Gender and Favorite Sport. Question: Are the two classifications independent or dependent? For instance, does people s favorite sport depend on their gender? Or does gender have no association with favorite sport?

Observed Counts for a r c Contingency Table (r = # of rows, c = # of columns) Column Variable 1 2 c Row Totals 1 n 11 n 12 n 1c R 1 Row 2 n 21 n 22 n 2c R 2 Variable r n r1 n r2 n rc R r Col. Totals C 1 C 2 C c n Probabilities for a r c Contingency Table: Column Variable 1 2 c 1 p 11 p 12 p 1c p row 1 Row 2 p 21 p 22 p 2c p row 2 Variable r p r1 p r2 p rc p row r p col 1 p col 2 p col c 1 Note: If the two classifications are independent, then: p 11 = (p row 1 )(p col 1 ) and p 12 = (p row 1 )(p col 2 ), etc. So under the hypothesis of independence, we expect the cell probabilities to be the product of the corresponding marginal probabilities.

Hence the (estimated) expected count in cell (i, j) is simply: 2 test for independence H 0 : The classifications are independent H a : The classifications are dependent Test statistic: where the expected count in cell (i, j) is Eˆ( n ij ) R C i n j Rejection region: 2 > 2, where 2 is based on (r 1)(c 1) d.f. and r = # of rows, c = # of columns.

Note: We need the sample size to be large enough that every expected cell count is at least 5. Example: Does the incidence of heart disease depend on snoring pattern? (Test using =.05.) Random sample of 2484 adults taken; results given in a contingency table: Snoring Pattern Never Occasionally Every Night ------------------------------------------------------------------------------------- Heart Yes 24 35 51 110 Disease No 1355 603 416 2374 -------------------------------------------------------------------------------------- 1379 638 467 2484 Expected Cell Counts: Test statistic: