Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Similar documents
Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Categorical Data Analysis. The data are often just counts of how many things each category has.

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

Hypothesis Testing: Chi-Square Test 1

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chi Square Analysis M&M Statistics. Name Period Date

Lecture 41 Sections Mon, Apr 7, 2008

Statistics 3858 : Contingency Tables

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

11-2 Multinomial Experiment

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

STAC51: Categorical data Analysis

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Discrete Multivariate Statistics

Chapter 1 Review of Equations and Inequalities

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Solving Equations by Adding and Subtracting

Lecture 41 Sections Wed, Nov 12, 2008

Chapter 26: Comparing Counts (Chi Square)

Psych 230. Psychological Measurement and Statistics

Uni- and Bivariate Power

Dealing with the assumption of independence between samples - introducing the paired design.

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

The following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5.

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Frequency Distribution Cross-Tabulation

Statistics for Managers Using Microsoft Excel

Ling 289 Contingency Table Statistics

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Descriptive Statistics (And a little bit on rounding and significant digits)

Testing Independence

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Comparing p s Dr. Don Edwards notes (slightly edited and augmented) The Odds for Success

Chapter 10: Chi-Square and F Distributions

Example. χ 2 = Continued on the next page. All cells

Is there a connection between gender, maths grade, hair colour and eye colour? Contents

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Econ 325: Introduction to Empirical Economics

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Properties of Arithmetic

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Lecture 5: ANOVA and Correlation

Review of Statistics 101

CS 124 Math Review Section January 29, 2018

Chapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence

Module 10: Analysis of Categorical Data Statistics (OA3102)

Lecture 28 Chi-Square Analysis

Statistics Handbook. All statistical tables were computed by the author.

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

- a value calculated or derived from the data.

MITOCW ocw f99-lec09_300k

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

13.1 Categorical Data and the Multinomial Experiment

10.2: The Chi Square Test for Goodness of Fit

STAT 705: Analysis of Contingency Tables

16.400/453J Human Factors Engineering. Design of Experiments II

10.2 Hypothesis Testing with Two-Way Tables

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

STAT 536: Genetic Statistics

Average weight of Eisenhower dollar: 23 grams

Rama Nada. -Ensherah Mokheemer. 1 P a g e

GAP CLOSING. Algebraic Expressions. Intermediate / Senior Facilitator s Guide

Statistics: revision

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Multinomial Logistic Regression Models

, p 1 < p 2 < < p l primes.

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Basic Business Statistics, 10/e

Binary Logistic Regression

Statistics 1L03 - Midterm #2 Review

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

Testing Research and Statistical Hypotheses

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Mathematical Notation Math Introduction to Applied Statistics

Chapters 9 and 10. Review for Exam. Chapter 9. Correlation and Regression. Overview. Paired Data

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

MITOCW ocw f99-lec23_300k

Two-sample Categorical data: Testing

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Fog Chamber Testing the Label: Photo of Fog. Joshua Gutwill 10/29/1999

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Multiple Regression Theory 2006 Samuel L. Baker

Fitting a Straight Line to Data

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Hypothesis testing. Data to decisions

8 Nominal and Ordinal Logistic Regression

CONCEPT 4 Scientific Law. CONCEPT 3 Scientific Theory

3 The language of proof

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Chapter 2: Describing Contingency Tables - I

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Part 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2

Psych Jan. 5, 2005

Mr. Stein s Words of Wisdom

Transcription:

Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each with two or more levels or values. 1) A factor could be defined as a categorical variable ; examples might be sex, color, age, etc. 2) A level could be defined as different values for the factor; examples matching the above might be {male, female}, {red, blue, green}, {0-4, 5-9, 10-14, etc.}. B) For each combination of levels, we have a certain count, which tells us how many individuals (or whatever we re measuring) are in that category. C) An example (data from 1988, Florida): Safety equipment in use Injury Fatal Non-fatal Total None 1,601 165,527 167,128 Seat belt 510 412,368 412,878 1) We have two factors, injury and safety equipment used, each with two levels. D) So what are we interested in?? 1) Depends. In the example above, what would be interesting to know? a) Does seat-belt use save lives?? i) Specifically, is p 1 = p 2, where: p 1 = proportion of fatalities for people not wearing a seat belt (estimated by p 1 ). p 2 = proportion of fatalities for people wearing a seat belt (estimated by p 2 ). b) Incidentally, we re obviously interested in a one-sided alternative here (why?).

2) Note - your text arranges the tables so that the proportions you compare always go across columns, it s reversed above, but it makes no difference. E) Another example (10.20, p. 405 [10.21, p. 414] {10.3.1, p. 373} in text): Hair color dark light Eye dark 729 131 color light 3129 2814 1) Now what do we want to know? 2) Does it make sense to figure out if the proportion of people with dark eyes is the same if they have dark hair or light hair? 3) Well, maybe. It might be much more interesting to ask the question like this: Does eye color influence hair color, or vice-versa? Or in statistical language: Is eye color independent of hair color? F) So, depending on the kind of data, we re either interested in: 1) Comparing proportions 2) Establishing independence/dependence G) So we phrase our hypotheses accordingly: 1) H 0 : p 1 = p 2, where the p's are proportions, H 1 : p 1 p 2 (or p 1 > p 2, etc.) 2) H 0 : Factor 1 and factor 2 are independent H 1 : Factor 1 and factor 2 are dependent H) Now let s choose α, just as always I) Calculate our test statistic: χ 2* = i=1 c (O i E i ) 2 E i

1) Note that this is identical to that used for the goodness of fit test. Now c is the number of cells in our table (four in both of our examples so far). 2) We ll figure out how to get expected values below. J) Look up the tabulated χ 2 value 1) Our degrees of freedom is NOT c-1 anymore. Instead, they are (r-1)x(k-1), where: r = # of rows, and k = # of columns 2) So for both our examples so far we have: (2-1) x (2-1) = 1 x 1 = 1 K) Compare our χ 2* with χ 2 table, and if it s larger (or equal to), then reject H 0. L) State our conclusion in terms of our original hypothesis. M) Expected values: 1) Suppose that hair color has no effect on eye color. a) that implies that the proportion of people with dark eyes is the same in both columns. b) therefore, figure out what your overall proportion is using the row totals (add up all people with dark eyes, and then all people with light eyes, then divide the number of people with light eyes by the total number of people). c) now figure out how many people had dark hair. Out of this, the proportion you calculated in b is the number you expect with dark eyes (multiply the number of people with dark hair by your proportion, to get the expected number of people with dark hair and dark eyes). 2) The easy way (which is identical to (1)): (Row total) (Column total) Expected value = (Grand total) above, you figured out the proportion of people with dark hair and dark eyes. To do this, you:

- figured out the number of people with dark eyes (row 1 total) - divided by total # of people (grand total) - multiplied by # of people with dark hair (column 1 total) II. Specific examples A) Let s work out the first example above, on seat belt use and fatalities. 1) State hypotheses: 2) α =.05 We observed: H 0 : The proportion of people killed is the same whether or not they are wearing a seatbelt or: H 0 : p 1 = p 2 (as defined above) H 1 : The proportions are not the same (we ll stick with a two-sided test for the moment) 3) Let s calculate our expected values: Safety equipment in use Injury Fatal Non-fatal Total None 1,601 165,527 167,128 Seat belt 510 412,368 412,878 Total 2,111 577,895 580,006 So, our first expected value (fatal, none): (2,111 x 167,128)/580,006 = 608.28 Our second expected value (non-fatal, none): (577,895 x 167,128)/580,006 = 166,519.72

Our third expected value (fatal, seat belt): (2,111 x 412,878)/580,006 = 1,502.72 And finally, (non-fatal, seat belt): (577,895 x 412,878)/580,006 = 411,375.28 We can put this in a table if we want (to keep it straight): Expected values: Safety equipment in use Injury Fatal Non-fatal Total None 608.28 166,519.72 167,128 Seat belt 1,502.72 411,375.28 412,878 Total 2,111 577,895 580,006 4) Now let s calculate our χ 2* : χ 2 * = (1601 608.28)2 608.28 + (165,527 166,519.72)2 166,519.72 + (510 1502) 2 1502.72 + (412,368 411,375)2 411,375,28 = 2284.25 5) If we look up our critical value of χ 2 in the table (1 d.f., and α =.05), we get: χ 2.05,1 = 3.84 6) So we reject H 0 and conclude that seat belts do affect the outcome of a traffic accident. Incidentally, p < 1 x 10-497, though I sort of doubt the accuracy of that figure. 7) However, we suspect seatbelts save lives, so what we really wanted was a one sided test: a) Proceed as above, though now you re alternative hypothesis is H 1 : p 1 > p 2

(p 1 is the proportion of fatalities for folks not wearing seatbelts). b) Make sure the data deviate from the null hypothesis in the direction of your alternative, otherwise STOP. In other words, make sure p 1 p 2 c) Now just use the appropriate column in your χ 2 tables (divide by two): 2 table 2 =.05,1 = 2.71 d) and again we get to reject (notice that the value is lower, which makes rejection easier and gives us more power - which is the reason for doing one sided tests!) e) if we do this for the above, we conclude with a very high degree of certainty that seatbelts save lives. (notice that p 1 = 0.00958, p 2 = 0.00124) B) Now let s quickly do our second example: Hair color dark light Eye dark 729 131 color light 3129 2814 1) H 0 : Eye color and Hair color are independent H 1 : They are not independent 2) α =.05 3) Calculate expected values (I m skipping the details, they re in your text, and I went through them above): Hair color dark light Eye dark 487.71 372.29 color light 3370.29 2572.71

III. R x K tables. 4) Calculate χ 2* (the same as usual, I m skipping the details): 5) Our tabulated χ 2 = 3.84 χ 2* = 315.671 6) So we reject our H 0, and conclude H 1. 7) Do we want to do anything else?? a) yes, we can see which direction our sample deviates from our H 0. In other words, since eye color and hair color are not independent, we can look and see if dark eyes occurs more commonly with dark hair, or if dark eyes occurs more commonly with light hair: b) if a person has dark hair, the proportion of dark eyes is 0.1890. c) if a person has light hair, the proportion of dark eyes is 0.0425. d) From that information, we can conclude that dark hair goes with dark eyes, and therefore light hair with light eyes. (No surprise: blond, blue-eyed, etc. etc. ) A) This is pretty easy to deal with. With the possible exception of figuring out your hypotheses, you know how to do this. Exercise 10.43 [10.40, p. 432] {10.5.3, p. 389} from text: Blood type Ulcer patients Controls TOTAL O 911 4578 5489 A 579 4219 4798 B 124 890 1014 AB 41 313 354 TOTAL 1655 10000 11655 What hypothesis might we be interested in? (Think about this logically, and come up with something that makes sense - this does take a little practice, unless you re the one who designed the study!) H 0 : The proportions of blood types is the same in ulcer patients as in controls. H 1 : The proportions are not the same.

IV) Comments: (note that we really have a compound H 0 ; a directional alternative makes no sense) α =.05 χ 2* = 49.016 df = ν = 3 (r-1)(k-1) = 3 x 1 = 3 From our table, the critical value of chi-square with 3 d.f. and an α of.05 = 7.81, so we reject. If we wanted a p-value, we note that the table only goes up to a χ 2 of 21.11 (for 3 d.f.), so we know that our p-value <.0001. Our conclusion is that the proportions of blood types is not the same in the two groups. A) One can calculate things like Odds ratios and relative risk. - these are actually very important in medical trials. - we will look at some of this in the next set of notes. B) The assumptions are identical to those for a goodness of fit test (i.e., random data and smallest expected value 5.