Ling 289 Contingency Table Statistics

Similar documents
TUTORIAL 8 SOLUTIONS #

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

11-2 Multinomial Experiment

2.3 Analysis of Categorical Data

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Testing Independence

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Summary of Chapters 7-9

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Unit 9: Inferences for Proportions and Count Data

Goodness of Fit Goodness of fit - 2 classes

Unit 9: Inferences for Proportions and Count Data

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

An introduction to biostatistics: part 1

Poisson regression: Further topics

hp calculators HP 50g Probability distributions The MTH (MATH) menu Probability distributions

Lecture 10: Generalized likelihood ratio test

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables Part One 1

Chi Square Analysis M&M Statistics. Name Period Date

Module 10: Analysis of Categorical Data Statistics (OA3102)

Multiple Sample Categorical Data

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Statistics 3858 : Contingency Tables

13.1 Categorical Data and the Multinomial Experiment

Categorical Data Analysis. The data are often just counts of how many things each category has.

Topic 21 Goodness of Fit

Introduction to Statistical Inference

Hypothesis Tests Solutions COR1-GB.1305 Statistics and Data Analysis

Inference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence

Lab #12: Exam 3 Review Key

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Name: Firas Rassoul-Agha

Random Variables Example:

This does not cover everything on the final. Look at the posted practice problems for other topics.

MALLOY PSYCH 3000 Hypothesis Testing PAGE 1. HYPOTHESIS TESTING Psychology 3000 Tom Malloy

Dealing with the assumption of independence between samples - introducing the paired design.

Example. χ 2 = Continued on the next page. All cells

Chi-Squared Tests. Semester 1. Chi-Squared Tests

Sociology 6Z03 Review II

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

UNIVERSITY OF TORONTO Faculty of Arts and Science

Statistical Data Analysis Stat 3: p-values, parameter estimation

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

CS 361: Probability & Statistics

Chapter 26: Comparing Counts (Chi Square)

Statistics 224 Solution key to EXAM 2 FALL 2007 Friday 11/2/07 Professor Michael Iltis (Lecture 2)

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Probability Theory and Applications

Week 1 Quantitative Analysis of Financial Markets Distributions A

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

SIMPLE RANDOM WALKS: IMPROBABILITY OF PROFITABLE STOPPING

Lecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

CS 361: Probability & Statistics

Discrete Multivariate Statistics

How do we compare the relative performance among competing models?

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Lecture 21: October 19

MS&E 226: Small Data

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Slides for Data Mining by I. H. Witten and E. Frank

Discrete Binary Distributions

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

The Multinomial Model

Session 3 The proportional odds model and the Mann-Whitney test

Lecture 7: The χ 2 and t distributions, frequentist confidence intervals, the Neyman-Pearson paradigm, and introductory frequentist hypothesis testing

Conditional Probability

Just Enough Likelihood

Computational Systems Biology: Biology X

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Chapter 10: Chi-Square and F Distributions

STAT 536: Genetic Statistics

Chapter 4: An Introduction to Probability and Statistics

7.1 What is it and why should we care?

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Chapters 10. Hypothesis Testing

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

Two-sample Categorical data: Testing

Confidence Intervals, Testing and ANOVA Summary

Lecture 7: Hypothesis Testing and ANOVA

Binary response data

Glossary for the Triola Statistics Series

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

CS 124 Math Review Section January 29, 2018

Transcription:

Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting, a bit of combinatorics, Binomial distribution Intro to hypothesis testing, degrees of freedom G 2 test (likelihood ratio) Chi-squared test Fisher s exact test 1. Contingency tables There are many situations in quantitative linguistic analysis where you will be interested in the possibility of association between two categorical variables. In this case, you will often want to represent your data as a contingency table. Here s an example from a project by Roger Levy: Parallelism in phrase coordination, [[NP1] and [NP2]]. I was interested in whether NP1 and NP2 tended to be similar to each other. As one instance of this, I looked at the patterns of PP modification of each NP (whether the NP had a PP modifier or not) in the Brown and Switchboard corpora, and came up with contingency tables like this: Brown Switchboard NP2 NP2 haspp nopp haspp nopp NP1 haspp 95 52 NP1 haspp 78 76 nopp 17 96 nopp 325 1230 Odds and odds ratios Given a contingency table of the form 1

Y y 1 y 2 X x 1 n 11 n 12 x 2 n 21 n 22 one of the things that s useful to talk about is how the value of one variable affects the distribution of the other. For example, the overall distribution of Y is P (y 1 ) = n 11 +n 21 P (y 2 ) = n 12 +n 22 Alternatively we can speak of the overall odds of y 1 versus y 2 : P (y 1 ) P (y 2 ) = n11+n21 n 12 +n 22 = n 11 + n 21 n 12 + n 22 If X = x 1, then the odds for Y are just Ω Y 1 = n 11 n 12. If the odds of Y for X = x 2 are greater than the odds of Y for X = x 1, then the outcome of X = x 2 increases the chances of Y = y 1. We can express the effect of the outcome of X on the odds of Y by the odds ratio (which turns out to be symmetric between X, Y ): θ = Ω 1 Ω 2 = n 11/n 12 n 21 /n 22 = n 11n 22 n 12 n 21 An odds ratio θ = 1 indicates no association between the variables. For the Brown and Switchboard parallelism examples: θ Brown = 95 96 52 17 = 9.93 θ Swbd = 78 1230 325 76 = 3.88 So the presence of PPs in left and right conjunct NPs seems more strongly interconnected for the Brown (written) corpus than for the Switchboard (spoken). 2. Counting, combinatorics and binomial distribution Take a binary variable X, let its possible values be 0 and 1. (e.g., coin flip: heads=1, tails=0; NP conjunct: haspp=1, nopp=0,... ) Call p its probability of outcome 1. (This is called a Bernoulli random variable with parameter p.) Suppose we take a variable Y to be the sum of four such Bernoulli random variables X i, where all the X i are independent and each has the same parameter p. By counting, the probability distribution of Y is: 2

Y P (Y ) [ ( ) ]1 p 3 [ ( ) 3 ] p 3 (1 p) 2 [ ( ) 2 ]6 p 2 (1 p) 2 1 [ ( ) 1 ] p(1 p) 3 0 [ ( ) 0 ]1 (1 p) This is called the four-trial binomial distribution with parameter p. A binomial distribution is determined by two parameters: the Bernoulli probability p, and the number of trials n. The binomial probability distribution for p, n is P (k) = ( ) n p k (1 p) n k k where ( ) n k = n! is the number of ways of choosing k out of n items. k!(n k)! The generalization of a binomial distribution to random variables with more than two outcomes is called a multinomial distribution. 3. Degrees of freedom This cool-sounding concept means the number of things that are left to vary after you have fit parameters based on your data. It is usually used to talk about the difference between the number of parameters that are estimated between two hypotheses in an hypothesis test. Suppose we wanted a model of the coordination example above where PP modification was independent in left and right NP, and had the same likelihood in the two conjuncts. This model has 2 parameters to estimate from the data: the marginal probabilities of NP1 having a PP and of NP2 having a PP. These determine the cell probabilities. In a slightly more general model, PP modification in the two conjuncts is independent, but the likelihood may differ by conjunct. This model requires estimating 3 probabilities: we put a probability estimate in 3 of the cells, and the fourth is given by the constraint that probabilities add to 1. Note also that the latter model space contains the former (this is required for most tests that compare models using degrees of freedom (dof). The dof is the difference between the parameters estimated under the alternative and null hypotheses. dof = 3 2 = 1 More generally, if you have a I J table, we can continue to follow the above logic, noting that a multinomial distribution with k possible outcomes has k 1 free parameters (since there is a constraint that all the p i parameters sum to 1). From this, we can derive the way the dof of a contingency table is usually expressed: dof = (IJ 1) [(I 1) + (J 1)] = IJ I J + 1 = (I 1)(J 1) 3

This is the basis for the dof in both the χ 2 and G 2 tests below.. Introductory hypothesis testing. Hypothesis testing typically goes like this: think up a specific, relatively simple model for your data. Call this the null hypothesis H 0. Contrast that with a more general model for your data, of which H 0 is a special case. Call this more general model H A, the alternative hypothesis. You will then calculate some statistic of your data based on H 0 with respect to H A. That statistic will have some probability distribution on the assumption that H 0 is correct. If the value of the statistic likelihood under H 0 is low enough, reject H 0 in favor of the more general H A. 5. Likelihood ratio test. With this test, the statistic you calculate for your data D is the likelihood ratio Λ = max P (D; H 0) max P (D; H A ) that is: the ratio of the maximum data likelihood under H 0 to the maximum data likelihood under H A. This requires that you explicitly formulate H 0 and H A. The quantity 2 log Λ is distributed like a chi-squared distribution [see below] with degrees of freedom equal to the difference in the the number of free parameters in H A and H 0. So this value can be evaluated against the χ 2 distribution to see whether the null hypothesis should be rejected. [Danger: don t apply this test when expected cell counts are low, like < 5.] 6. Chi-squared test. Apply this test to all sorts of contingency tables, if you have a model with k parameters that predicts expected values E ij for all cells. You calculate the X 2 statistic: X 2 = ij E ij [n ij E ij ] 2 In the chi-squared test, H A is the model that each cell in your table has its own parameter p i in one big multinomial distribution. Therefore, if your contingency table has n cells, then the difference in the number of free parameters between H A and H 0 is n k 1. Correspondingly, you can look up the p-value of your X 2 statistic for the χ 2 distribution with n k 1 degrees of freedom. [Note that the distribution is a mathematical curve, which is different from the statistic, which is a test.] [Danger: don t apply this test when expected cell counts are low, like < 5.] 7. Fisher s exact test. This test applies to a 2-by-2 contingency table:

Y y 1 y 2 X x 1 n 11 n 12 n 1 x 2 n 21 n 22 n 2 n 1 n 2 n H 0 is the model that all marginal totals are fixed, but that the individual cell totals are not alternatively stated, that the individual outcomes of X and Y are independent. [H 0 has one free parameter why?] H A is the model that the individual outcomes of X and Y are not independent. With Fisher s exact test, you directly calculate the exact likelihood of obtaining a result as extreme or more extreme than the result that you got. [Since it is an exact test, you can use Fisher s exact test regardless of expected and actual cell counts.] 5