General IxJ Contingency Tables

Similar documents
Confidence Intervals for Association Parameters Testing Independence in Two-Way Contingency Tables Following-Up Chi-Squared Tests

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Statistical Hypothesis Testing. STAT 536: Genetic Statistics. Statistical Hypothesis Testing - Terminology. Hardy-Weinberg Disequilibrium

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Chapter 13, Part A Analysis of Variance and Experimental Design

Properties and Hypothesis Testing

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

1 Models for Matched Pairs

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

UCLA STAT 110B Applied Statistics for Engineering and the Sciences

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Categorical Data Analysis

Efficient GMM LECTURE 12 GMM II

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

4. Partial Sums and the Central Limit Theorem

11 Correlation and Regression

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Power and Type II Error

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Algebra of Least Squares

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Bayesian Methods: Introduction to Multi-parameter Models

Understanding Samples

Successful HE applicants. Information sheet A Number of applicants. Gender Applicants Accepts Applicants Accepts. Age. Domicile

MA238 Assignment 4 Solutions (part a)

Distribution of Random Samples & Limit theorems

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Lecture 7: Properties of Random Samples

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Mathematical Notation Math Introduction to Applied Statistics

Good luck! School of Business and Economics. Business Statistics E_BK1_BS / E_IBA1_BS. Date: 25 May, Time: 12:00. Calculator allowed:

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Common Large/Small Sample Tests 1/55

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

4 Multidimensional quantitative data

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Estimation of a population proportion March 23,

Topic 9: Sampling Distributions of Estimators

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Stat 200 -Testing Summary Page 1

6 Sample Size Calculations

A statistical method to determine sample size to estimate characteristic value of soil parameters

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Chapter 23: Inferences About Means

1 Inferential Methods for Correlation and Regression Analysis

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Chapter 6 Sampling Distributions

Statistics 3858 : Likelihood Ratio for Multinomial Models

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

1 Review of Probability & Statistics

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Asymptotic Results for the Linear Regression Model

Expectation and Variance of a random variable

One-Sample Test for Proportion

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Econ 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement.

Lecture 33: Bootstrap

This is an introductory course in Analysis of Variance and Design of Experiments.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

NCSS Statistical Software. Tolerance Intervals

Topic 9: Sampling Distributions of Estimators

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

5. Likelihood Ratio Tests

Topic 9: Sampling Distributions of Estimators

Data Analysis and Statistical Methods Statistics 651

Chapter VII Measures of Correlation

Chapter two: Hypothesis testing

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Matrix Representation of Data in Experiment

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Probability and statistics: basic terms

Chapter 22: What is a Test of Significance?

Principle Of Superposition

Additional Notes and Computational Formulas CHAPTER 3

Homework for 4/9 Due 4/16

5. A formulae page and two tables are provided at the end of Part A of the examination PART A

Statistical Properties of OLS estimators

Biostatistics for Med Students. Lecture 2

SNAP Centre Workshop. Basic Algebraic Manipulation

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Spectral Partitioning in the Planted Partition Model

Transcription:

page1 Geeral x Cotigecy Tables We ow geeralize our previous results from the prospective, retrospective ad cross-sectioal studies ad the Poisso samplig case to x cotigecy tables. For such tables, the test of hypothesis for a - prospective study is H 0 : j1 j... j for all j - retrospective study is H 0 : XiY1 XiY... XiY for all i - the cross-sectioal study which tests idepedece is H 0 : ij i j for i 1,..., ad j 1,..., Example: As part of the 1991 Geeral Social Survey (GSS), a sample of 980 subjects were classified accordig to geder ad political party idetificatio, with the followig results: PARTY DENTFCATON depedet Democrat Republica Total Female 11 73 1 79 13 5 1 577 Male 1 47 165 3 191 403 Total 1 10 444 3 416 980(fixed) Are the variables Geder ad Party detificatio idepedet? Alteratively, we might ask Does geder have a effect o political party idetificatio? Let ij probability that a idividual is of geder i i 1, ad idetifies with party j j 1,, 3. H 0 : ij i j for i 1, ad j 1,, 3 For the case of a x cotigecy tables, the expected frequecies were give by

page m ij ij i j ad we used the m ij to test the ull hypothesis above usig either Pearso s X ij m ij m ij or the log-likelihood G ij log ij m ij with both havig df 1 1 1 1 1. A SAS program follows:. SAS PROGRAM (data iteral) data politics; iput $ PARTYD $ cout; cards; female idpdt 73 female democrat 79 female republic 5 male idpdt 47 male democrat 165 male republic 191 ; proc freq order data; tables *PARTYD / chisq expected cellchi; weight cout; ru; The values for X 7.01 with pvalue of 0.0031 ad G 7.00 with p-value 0.030 (both approximately chi-square with df ) idicate strog evidece that geder ad party idetificatio are ot idepedet. Now cosider ways to ivestigate the associatio betwee two variables i a geeral x table.

page3 1. Stadardized Residuals Stadardized residuals uder H 0 are give by e ij ij m ij m ij. These stadardized residuals ca be compared to stadard ormal percetiles (such as 1.96. Sice the asymptotic variace of e ij is less tha 1, they provide coservative idicatios of cells that have a lack of fit uder the ull hypothesis. For example, we obtai the followig as the stadardized residuals (computed uder the assumptio of H 0 ) for our data set: PARTY DENTFCATON depedet Democrat Republica Female e 11 0.8 e 1 1.09 e 13 1.7 Male e 1 0.34 e 1.30 e 3 1.5 These stadardized residuals do ot appear uusual. (Note: e 3 1.5 appears to idicate that the umber of males who idetify as Republica is higher tha what would be expected if geder ad party idetificatio were idepedet but this stadardized residual is still ot very large.). Adjusted Residuals Cosider istead the residual ij m ij p ij ij sice p ij ij p ij i j (uder H 0 ) p ij p i p j p ij p i1 p i...p i p 1j p j...p j To get the asymptotic variace of ij m ij, we resort to usig the delta method : Set ad gp ij m ij p ij p i1 p i...p i p 1j p j...p j g ij i1 i... i 1j j... j.

page4 The the asymptotic variace of is gp g where ij g ij Uder H 0 : ij i j we ca simply the expressio Usig ij i j, the asymptotic variace of gp g becomes ad thus the asymptotic variace of gp is which ca be estimated by i j 1 i 1 j i j 1 i 1 j i j 1 i 1 j m ij 1p i 1p j where i p i i ad j p j j We defie the adjusted residuals uder H 0 as

page5 r ij ij m ij m ij 1p i 1p j These have a asymptotically stadard ormal distributio so this will allow us to detect abormally large (i absolute value) adjusted residuals. For our data, we obtai the followig adjusted residuals: PARTY DENTFCATON depedet Democrat Republica Female r 11 0.46 r 1.9 r 13.6 Male r 1 0.46 r.9 r 3.6 The adjusted residuals idicate that the umber of females who idetify as Democrat is higher tha would be expected if geder ad party idetificatio are idepedet (i.e. r 1.9). They also idicate that the umber of males who idetify as Republica is higher tha would be expected uder the assumptio that geder ad party idetificatio are idepedet (i.e. r 3.6).

page6 Partitioig the Likelihood Ratio Chi-Square Statistic (G Aother way to study the associatio betwee two variables i a x table ivolves partitioig the likelihood ratio Chi-square statistic so that the compoets represet certai aspects of the associatio. Partitioig may reveal that a associatio primarily reflects differeces betwee certai categories or groupig of categories. Cosider our data ad look ow at the x subtable based o the first two colums of the origial x3 table: PARTY DENTFCATON depedet Democrat Total Female 11 73 1 79 1 35 Male 1 47 165 1 1 10 444 564 A test of idepedece o this table yields G 0.16 with df 1 (associated pvalue of 0.689.Thus, based o this subsample of the full dataset, we do ot have sufficiet evidece to refute the ull hypothesis of idepedece of geder ad party idetificatio (beig depedet or Democrat). This meas that, of those subjects who idetify either as depedet or Democrat, there does ot appear to be much evidece of a differece betwee females ad males i the relative umbers i the two categories. Suppose we for a differet, secod x table as follows (based o the above result, we have collapsed the depedet ad Democrat group ito a combied groupig): PARTY DENTFCATON dep/dem Republica Total Female 11 35 1 5 1 577 Male 1 1 191 403 1 564 416 980 A test of idepedece of this table yields G 6.84 with df 1 (associated p value of 0.0089. Here we have strog evidece agaist the ull hypothesis of idepedece. Thus there is strog evidece of a differece betwee females ad males i the relative umbers idetifyig as Republica istead of depedet or Democrat. Note that, summig the two G values above, we obtai 0.166.84 7.00 i.e. the sum of the two G compoets equals G for the complete x3 table. ( The same holds for the degrees of freedom.)

page7 t might seem atural to compute G for separate x tables that pair each colum with a particular oe, say the first or the last. However, the resultig compoet statistics are ot idepedet ad thus do ot sum to G for the complete table. Certai rules must be followed to esure that G for the complete table partitios ito idepedet compoets. Agresti idicates that Goodma(1968,1969,1971), rwi (1949), verso(1979) ad Lacaster(1949) discuss rules that help i determiig subtables for which the compoets of G are idepedet. Amog these rules are the followig ecessary coditios: 1) The degrees of freedom for the subtables must sum to the degrees of freedom for the origial table. ) Each cell cout i the origial table must be a cell cout i oe ad oly oe subtable. 3) Each margial total of the origial table must be a margial total for oe ad oly oe subtable.