Chi-Square Analyses Stat 251
|
|
- Penelope Hunter
- 6 years ago
- Views:
Transcription
1 Chi-Square Analyses Stat 251 While we have analyses for comparing more than 2 means, we cannot use them when trying to compare more than one proportion. However, there is a distribution that is related to the standard normal distribution (z) that works for comparing more than two proportions and can be used to ask some of the following questions about populations: Do the data follow a specified distribution? Are the categories independent of each other? Are the probabilities distributed the same over all categories? Rather than a test statistic for each pair of proportions, we d rather like to use just one. What we do is measure the distance each sample value is from the average (from the norm ). If we had a z-score for each pair, the sum of the squared z-scores would be a new (new to you) distribution called Chi-square (pronounced ky as in sky ), denoted by χ 2. The distribution is a skewed distribution (skewed right) so it is not a symmetric distribution like z or t. χ 2 n 1 = n zi 2 = zi 2 + z zn 2 i The following graph will show how the χ 2 distribution changes shape with increasing df. x=rchisq(1000,df=2); hist(x,prob=t,ylim=c(0,.5),xlim=c(0,15),main='') title('chi-square Distributions \nwith 3 different df') curve(dchisq(x,df=2),add=t,lwd=3,col="blue") curve(dchisq(x,df=4),add=t,lwd=3,col="red") curve(dchisq(x,df=9),add=t,lwd=3,col='green') dfs=c("df = 2 ","df = 4 ","df = 9 "); colours=c('blue','red','green') legend('topright',legend=dfs,lwd=c(3,3,3),col=colours) Chi Square Distributions with 3 different df Density df = 2 df = 4 df = x 1
2 Assumptions of any Chi-square test 1. The data must be counts from categories 2. Independence of observations 3. E i 5; each individual expected value (E i ) must be at least 5 Goodness-of-Fit (GoF) Chi-square for a one-way table (a table that has categories and counts for each category): In evaluating whether there is sufficient evidence that a set of observed counts, O 1, O 2,, O k in k categories are unusually different from what would be expected under a null hypothesis. The expected values under the null hypothesis, called E 1, E 2,..., E k. The hypotheses for GoF test are: Null hypothesis H 0 : p 1 = p 01, p 2 = p 02,, p k = p 0k Or H 0 : p 1 = p 2 = = p k = p 0 Or H 0 : The data follows theoretical distribution Alternative hypothesis H a : At least one p i differs from theoretical probability Or Expected values H a : The data does not follow theoretical distribution E i = np i You will need to find the probabilities associated with the null hypothesized distribution, then multiple each category value by the probability to get the expected value. Test statistic Rejection region Reject H 0 if χ 2 calc χ2 α,df χ 2 = (O i E i ) 2 = (observed expected) 2 E i expected where df = k 1 (k = number of categories). Conclusion (in context) When the null hypothesis is rejected, in terms of the context of the data, it means that we think that the data does not follow the theoretical (specified) distribution. When we fail to reject the null hypothesis, we are maintaining that the data does follow the theoretical (specified) distribution. 2
3 GoF example A paper 1 about an experiment reported the following data on phenotypes resulting from crossing tall cut-leaf tomatoes with dwarf potato-leaf tomatoes. We wish to investigate if the frequencies below are consistent with Mendel s laws of inheritance which implies that the phenotypes should occur in a 9:3:3:1 ratio. Tall cut-leaf Tall potato-leaf Dwarf cut-leaf Dwarf potato-leaf Ratio Count A 9:3:3:1 ratio means the probabilities are 9/16, 3/16 ( 2) and 1/16 (the 16 comes from the sum of all the numbers in the ratio). Is there sufficient evidence that the tomato plants follow Mendel s Law? Keep in mind that when you have to do this in R, I have provided the code to get the data in so make sure you pay most attention to parts of code that have the chisq.test() commands and examples obs <- c(926,288,293,104) types <- c("tall cut","tall potato","dwarf cut","dwarf potato") rnames=c("obs","exp") k <- length(types) mendel <- c(9,3,3,1); mendel [1] probs <- mendel/sum(mendel); probs [1] n <- sum(obs); n [1] 1611 exp <- round(n*probs,2); exp [1] tomatoes <- data.frame(types,obs,exp,probs); tomatoes types obs exp probs 1 Tall cut Tall potato Dwarf cut Dwarf potato Linkage Studies of the Tomato (Transactions of the Royal Canadian Institute, 1931: 1-19) 3
4 # barplot of data (barplot since data is categorical) barplot(obs,names.arg=types,col=rainbow(4),ylab="observed",ylim=c(0,1000)) title('barplot of Tomato Plant \ndihybrids (obs)') Barplot of Tomato Plant dihybrids (obs) Observed Tall cut Dwarf cut barplot(exp,names.arg=types,col=rainbow(4),ylab="expected",ylim=c(0,1000)) title('barplot of Tomato Plant \ndihybrids (exp)') Barplot of Tomato Plant dihybrids (exp) Expected Tall cut Dwarf cut 4
5 # barplot with obs and exp counts <- rbind(obs, exp); colnames(counts) <- types colors=c("cornflowerblue","darkorchid4") barplot(counts,col=colors,legend=rnames,beside=t,ylim=c(0,1000)) title('barplot of Tomato Plant \ndihybrids Obs and Exp') Barplot of Tomato Plant dihybrids Obs and Exp obs exp Tall cut Dwarf cut Hypotheses H 0 : p 1 = 9 16, p 2 = p 3 = 3 16, p 4 = 1 16 or H 0: data follows Mendel s Law H a : At least one p i differs or H a : data does not follow Mendel s Law Expected values E 1 = 1611(9/16) = E 2 = 1611(3/16) = E 3 = 1611(3/16) = E 4 = 1611(1/16) = Here we can check to see all E i 5 Test Statistic E i = n(p i ) χ 2 = (O i E i ) 2 = (observed expected) 2 E i expected = ( ) ( ) ( ) ( ) = df = k 1 where k = 4 so df = 4 1 = 3 Rejection Region 1. Critical Value approach: Reject H 0 if χ 2 calc χ2 α,df where χ2 α,df = χ2.05,3 = pvalue approach: Reject H 0 if pvalue α Results We are doing the critical value approach in the by-hand example. So χ 2.05,3 = so we will fail to reject H 0. Conclusion (in context) We failed to reject H 0 so that tells us that the plants do follow Mendel s law (maintaining the 9:3:3:1 ratio). 5
6 GoF in R The command to use is: chisq.test(x,p=probs,simulate.p.value=f) Where: x: x is either a vector of counts or a minimum of at least a 2X2 table p=probs: the probabilities of the theoretical (specified) distribution simulate.p.value: F (default); T when E i 5 assumption is not met # HERE!!! Pay attention here!!! # Seriously, this part below is what you need after you # get the data in and change names of objects (dataset) # with chisq.test() GoF=chisq.test(obs,p=probs); GoF Chi-squared test for given probabilities data: obs X-squared = , df = 3, p-value = GoF$expected [1] # long way without chisq.test() chisq.calc <- sum((obs-exp)^2/exp); chisq.calc [1] df=k-1 pvalue <- 1-pchisq(chisq.calc,3); pvalue [1] Since the pvalue = α(0.05), we fail to reject H 0 (see conclusion in context from by-hand example). Test of Independence The test of Independence explores whether two categorical random variables are independent or whether some level of dependency existst between them. Each dataset will be constructed into a table with I rows and J columns. Let n ij denote the number of individuals in the sample falling in the (i, j) th cell (of row i, column j) of the table. The following is a prototype of a general table that displays the counts (n ij ) and is called a two-way contingency table. I and J (capital I,J) are the row and column totals, respectively. 6
7 j... J 1 n 11 n n 1j... n 1J 2 n i n i1... n ij.... I n I1... n IJ The hypotheses for Independence test are: Null hypothesis H 0 : p ij = (p i )(p j ) i = 1, 2,..., I and j = 1, 2,..., J Or H 0 : The row context and column context are independent Alternative hypothesis H a : H 0 is not true Expected values E ij = n in j n = (rtotal)(ctotal) grandtotal Test statistic n j n i χ 2 = i=1 j=1 (O ij E ij ) 2 E ij = all r c (observed expected) 2 expected = (O ij E ij ) 2 E ij Rejection region Reject H 0 if χ 2 calc χ2 α,df where df = (r 1)(c 1) (r = number of rows, c = number of columns). Conclusion (in context) When the null hypothesis is rejected, in terms of the context of the data, it means that we think that the context of the rows and context of the columns are dependent (there is a dependency). When we fail to reject the null hypothesis, we are maintaining that the context of the rows and context of the columns are dependent (there is no relationship). Independence Test example A study of the relationship between facility conditions at gasoline stations and aggressiveness in the pricing of gasoline 2 reports the accompanying data based on a sample of n = 441 stations. Does the data suggest that facility conditions and pricing policy are independent of one another? gas <- as.table(rbind(c(24,15,17), c(52,73,80),c(58,86,36))) cond=c("substandard","standard","modern"); price=c("aggressive","neutral", "Nonaggressive") dimnames(gas) <- list(condition=cond,pricing.policy=price); gas 2 An Analysis of Price Aggressiveness in Gasoline Marketing, (Journal Marketing Research, 1970: 36-42) 7
8 Pricing.Policy Condition Aggressive Neutral Nonaggressive Substandard Standard Modern gaspr=as.matrix(gas) # row totals n1j=sum(gas[1,]); n2j=sum(gas[2,]); n3j=sum(gas[3,]); n123j=c(n1j,n2j,n3j) # column totals ni1=sum(gas[,1]); ni2=sum(gas[,2]); ni3=sum(gas[,3]); ni123=c(ni1,ni2,ni3) # grand total n=sum(gas) # Expected value for row1,col1 exp11 <- round((n1j*ni1)/n,2); exp11 [1] exp=round((ni123 %*% t(n123j))/n,2) dimnames(exp) <- list(condition=cond,pricing.policy=price) expt=matrix(exp,ncol=3,byrow=t) # barplot of data (barplot since data is categorical) barplot(gas,names.arg=rownames(gas),col=rainbow(9),ylab="observed",ylim=c(0,225)) title('condition by Pricing (obs)') Condition by Pricing (obs) Observed Substandard Modern 8
9 barplot(exp,names.arg=rownames(gas),col=rainbow(9),ylab="expected",ylim=c(0,225)) title('condition by Pricing (exp)') Condition by Pricing (exp) Expected Substandard Modern # barplot with obs and exp counts <- rbind(gaspr,expt) barplot(counts,legend=rownames(gas),beside=t,ylim=c(0,100)) title('condition by Pricing Obs and Exp') Condition by Pricing Obs and Exp Substandard Standard Modern Aggressive Neutral Null Hypothesis H 0 : p ij = (p i )(p j ) i = 1, 2,..., I and j = 1, 2,..., J (pricing and conditions in gasoline marketing are independent) Alternative hypothesis H a : H 0 is not true (pricing and conditions in gasoline are dependent) Expected values E ij = n in j n = (rtotal)(ctotal) grandtotal E 11 = (56 134/441) = E 12 = (56 174/441) =
10 E 13 = (56 133/441) = E 21 = ( /441) = 22.1 E 22 = ( /441) = E 23 = ( /441) = E 31 = ( /441) = E 32 = ( /441) = E 33 = ( /441) = Here we can check to see all E ij 5 Test Statistic (same as the Independence Test Statistic calculation) n j n i χ 2 = i=1 j=1 (O ij E ij ) 2 E ij = all r c (observed expected) 2 expected = (O ij E ij ) 2 E ij = ( ) ( ) ( ) = df = (r 1)(c 1) where r, c = (3, 3) so df = (3 1)(3 1) = 4 Rejection Region 1. Critical Value approach: Reject H 0 if χ 2 calc χ2 α,df where χ2 α,df = χ2.05,4 = pvalue approach: Reject H 0 if pvalue α Results We are doing the critical value approach in the by-hand example. So χ 2.05,4 = so we will reject H 0. Conclusion (in context) We rejected H 0 so that tells us that knowledge of a stations s pricing policy does give information about the condition of facilities at the station. Stations with an aggressive pricing policy appear to have more substandard facilities than stations with a neutral or nonaggressive policy. # HERE!!! Pay attention here!!! # Seriously, this part below is what you need after you # get the data in and change names of objects (dataset) # with chisq.test() indep=chisq.test(gas); indep Pearson's Chi-squared test data: gas X-squared = , df = 4, p-value = # long way without chisq.test() chisq.calc <- round(sum((gaspr-expt)^2/expt),3); chisq.calc [1] pvalue <- 1-pchisq(chisq.calc,4); pvalue [1]
11 indep$expected Pricing.Policy Condition Aggressive Neutral Nonaggressive Substandard Standard Modern Since the pvalue = α(0.05), we reject H 0 (see conclusion in context from by-hand example). Homogeneous Test We are assuming that each individual in every one of the I populations belongs in exactly one of J categories. The hypotheses for Homogeneous test are: Null hypothesis H 0 : p 1j = p 2j =... = p Ij j=1,2,...,j, i=1,2,...,i Or Alternative hypothesis Expected values Same as Independence Test Test statistic Same as Independence Test Rejection region Same as Independence Test H 0 : The row context is distributed homogeneously (the same) over the column context H a : H 0 is not true Conclusion (in context) When the null hypothesis is rejected, in terms of the context of the data, it means that we think that the context of the rows are distributed differently across the context of the columns. When we fail to reject the null hypothesis, we are maintaining that the context of the rows are distributed similarly across the context of the columns. Homogeneous example A company packages a particular product in cans of three different sizes, each one using a different production line. Most cans conform to specifications, but a quality control engineer has identified blemish on a can, crack in the can, improper pull tab location, pull tab missing or some other as main reasons for nonconformities. A sample of nonconforming units is selected from each of the three lines, and each unit is categorical according to reason for nonconformity. Production line Blemish Crack Location Missing Other
12 Do the data suggest that the proportions failing in the various nonoconformance categories are not the same for the three lines? defects <- as.table(rbind(c(34,65,17,21,13), c(23,52,25,19,6),c(32,28,16,14,10))) prodl=1:3; nonconf=c("blemish","crack", "Location","Missing","Other") dimnames(defects) <- list(productionline=prodl,nonconformity=nonconf) defects Nonconformity ProductionLine Blemish Crack Location Missing Other fail=as.matrix(defects) # row totals; n1j=sum(defects[1,]); n2j=sum(defects[2,]); n3j=sum(defects[3,]); n123j=c(n1j,n2j,n3j) # column totals and grand total ni1=sum(defects[,1]); ni2=sum(defects[,2]); ni3=sum(defects[,3]); ni4=sum(defects[,4]) ni5=sum(defects[,5]); ni12345=c(ni1,ni2,ni3,ni4,ni5); n=sum(defects) # Expected value for row1,col1 exp11 <- round((n1j*ni1)/n,2); exp11 [1] 35.6 # barplot of data (barplot since data is categorical); first do data prep for barplots expij=round((ni12345 %*% t(n123j))/n,2) exp=matrix(expij,ncol=5,byrow=t) dimnames(exp) <- list(production.line=prodl,nonconformity=nonconf) barplot(defects,names.arg=colnames(defects),col=rainbow(15),ylab="observed",ylim=c(0,175)) title('production Line Nonconformities \n(obs)') Production Line Nonconformities (obs) Observed Blemish Location Other 12
13 barplot(exp,names.arg=colnames(defects),col=rainbow(15),ylab="expected",ylim=c(0,160)) title('production Line Nonconformities \n(exp)') Production Line Nonconformities (exp) Expected Blemish Location Other # barplot with obs and exp counts <- rbind(fail,exp) barplot(counts,legend=rownames(defects),beside=t,ylim=c(0,75)) title('production Line Nonconformities \n Obs and Exp') Production Line Nonconformities Obs and Exp Blemish Location Other 13
14 Hypotheses Null hypothesis H 0 : p 1j = p 2j =... = p Ij j=1,2,...,j, i=1,2,...,i Or H 0 : Blemish types have a homogeneous distribution over the prouction lines (no one production line has more types of blemishes than any other production line) Alternative hypothesis Expected values E 11 = (158)(97) = 35.6 E 12 = (158)(145) = 58 E 13 = (158)(58) = 23.2 E 14 = (158)(54) = 21.6 E 15 = (158)(29) = 11.6 E 21 = (125)(97) = E 22 = (125)(145) = E 23 = (125)(58) = E 24 = (125)(54) = 18 E 25 = (125)(29) = 9.67 E 31 = (100)(97) = E 32 = (100)(145) = E 33 = (100)(58) = E 34 = (100)(54) = 14.4 E 35 = (100)(29) = 7.73 Here we can check to see all E ij 5 Test Statistic H a : H 0 is not true E ij = (n i)(n j ) n = (rtotal)(ctotal) grandtotal n j n i χ 2 = i=1 j=1 (O ij E ij ) 2 E ij = all r c (observed expected) 2 expected = (O ij E ij ) 2 E ij = ( ) (65 58) ( ) = df = (r 1)(c 1) where r, c = (3, 5) so df = (3 1)(5 1) = 2 4 = 8 Rejection Region 1. Critical Value approach: Reject H 0 if χ 2 calc χ2 α,df where χ2 α,df = χ2.05,8 = pvalue approach: Reject H 0 if pvalue α Results We are doing the critical value approach in the by-hand example. So χ 2.05,8 = so we will fail to reject H 0. Conclusion (in context) We failed to reject H 0 so that tells us that the production lines have the same rates of different types of nonconformities, so no one production line produces more of a specific type of nonconformity. 14
15 # HERE!!! Pay attention here!!! # Seriously, this part below is what you need after you # get the data in and change names of objects (dataset) # with chisq.test() hgeneous=chisq.test(defects); hgeneous Pearson's Chi-squared test data: defects X-squared = , df = 8, p-value = hgeneous$expected; exp Nonconformity ProductionLine Blemish Crack Location Missing Other Nonconformity Production.Line Blemish Crack Location Missing Other # long way without chisq.test() chisq.calc <- round(sum((fail-exp)^2/exp),3); chisq.calc [1] pvalue <- 1-pchisq(chisq.calc,8); pvalue [1]
Chi-square (χ 2 ) Tests
Math 442 - Mathematical Statistics II April 30, 2018 Chi-square (χ 2 ) Tests Common Uses of the χ 2 test. 1. Testing Goodness-of-fit. 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4.
More informationChi-square (χ 2 ) Tests
Math 145 - Elementary Statistics April 17, 2007 Common Uses of the χ 2 test. 1. Testing Goodness-of-fit. Chi-square (χ 2 ) Tests 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4. Testing
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More information:the actual population proportion are equal to the hypothesized sample proportions 2. H a
AP Statistics Chapter 14 Chi- Square Distribution Procedures I. Chi- Square Distribution ( χ 2 ) The chi- square test is used when comparing categorical data or multiple proportions. a. Family of only
More informationLecture 22. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
Lecture 22 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 19, 2007 1 2 3 4 5 6 7 8 9 1 tests for equivalence of two binomial 2 tests for,
More information10.2: The Chi Square Test for Goodness of Fit
10.2: The Chi Square Test for Goodness of Fit We can perform a hypothesis test to determine whether the distribution of a single categorical variable is following a proposed distribution. We call this
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More informationHYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC
1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationCh. 11 Inference for Distributions of Categorical Data
Ch. 11 Inference for Distributions of Categorical Data CH. 11 2 INFERENCES FOR RELATIONSHIPS The two sample z procedures from Ch. 10 allowed us to compare proportions of successes in two populations or
More informationLecture 28 Chi-Square Analysis
Lecture 28 STAT 225 Introduction to Probability Models April 23, 2014 Whitney Huang Purdue University 28.1 χ 2 test for For a given contingency table, we want to test if two have a relationship or not
More informationStatistics - Lecture 04
Statistics - Lecture 04 Nicodème Paul Faculté de médecine, Université de Strasbourg file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 1/40 Correlation In many situations the objective
More informationStatistics 251: Statistical Methods
Statistics 251: Statistical Methods 1-sample Hypothesis Tests Module 9 2018 Introduction We have learned about estimating parameters by point estimation and interval estimation (specifically confidence
More informationThe Chi-Square Distributions
MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness
More informationGoodness of Fit Tests
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of
More informationChapter 26: Comparing Counts (Chi Square)
Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces
More information11-2 Multinomial Experiment
Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:
More informationModule 10: Analysis of Categorical Data Statistics (OA3102)
Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationOne-Way Tables and Goodness of Fit
Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the
More informationIntroduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationChapter 10: Chi-Square and F Distributions
Chapter 10: Chi-Square and F Distributions Chapter Notes 1 Chi-Square: Tests of Independence 2 4 & of Homogeneity 2 Chi-Square: Goodness of Fit 5 6 3 Testing & Estimating a Single Variance 7 10 or Standard
More informationStatistics 301: Probability and Statistics 1-sample Hypothesis Tests Module
Statistics 301: Probability and Statistics 1-sample Hypothesis Tests Module 9 2018 Student s t graphs For the heck of it: x
More informationChi-Square. Heibatollah Baghi, and Mastee Badii
1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale
More informationNominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers
Nominal Data Greg C Elvers 1 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics A parametric statistic is a statistic that makes certain
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationDegrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large
Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future
More informationStatistics for laboratory scientists II
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information10.2 Hypothesis Testing with Two-Way Tables
10.2 Hypothesis Testing with Two-Way Tables Part 2: more examples 3x3 Two way table 2x3 Two-way table (worksheet) 1 Example 2: n Is there an association between the type of school area and the students'
More informationStatistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017
Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to
More informationChapter 11. Hypothesis Testing (II)
Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let
More informationThe Chi-Square Distributions
MATH 03 The Chi-Square Distributions Dr. Neal, Spring 009 The chi-square distributions can be used in statistics to analyze the standard deviation of a normally distributed measurement and to test the
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationExample. χ 2 = Continued on the next page. All cells
Section 11.1 Chi Square Statistic k Categories 1 st 2 nd 3 rd k th Total Observed Frequencies O 1 O 2 O 3 O k n Expected Frequencies E 1 E 2 E 3 E k n O 1 + O 2 + O 3 + + O k = n E 1 + E 2 + E 3 + + E
More informationChi-Squared Tests. Semester 1. Chi-Squared Tests
Semester 1 Goodness of Fit Up to now, we have tested hypotheses concerning the values of population parameters such as the population mean or proportion. We have not considered testing hypotheses about
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationχ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.
I. T or F. (1 points each) 1. The χ -distribution is symmetric. F. The χ may be negative, zero, or positive F 3. The chi-square distribution is skewed to the right. T 4. The observed frequency of a cell
More informationHomework 9 Sample Solution
Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationQuantitative Analysis and Empirical Methods
Hypothesis testing Sciences Po, Paris, CEE / LIEPP Introduction Hypotheses Procedure of hypothesis testing Two-tailed and one-tailed tests Statistical tests with categorical variables A hypothesis A testable
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationStudy Ch. 13.1, # 1 4 all Study Ch. 13.2, # 9 15, 25, 27, 31 [# 11 17, ~27, 29, ~33]
GOALS: 1. Learn the properties of the χ 2 Distribution. 2. Understand how the shape of the χ 2 Distribution changes as the df increases. 3. Be able to find p values. 4. Recognize that χ 2 tests are right
More informationLecture 41 Sections Mon, Apr 7, 2008
Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,
More informationij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as
page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationSTP 226 ELEMENTARY STATISTICS NOTES
STP 226 ELEMENTARY STATISTICS NOTES PART 1V INFERENTIAL STATISTICS CHAPTER 12 CHI SQUARE PROCEDURES 12.1 The Chi Square Distribution A variable has a chi square distribution if the shape of its distribution
More informationRegression Analysis: Exploring relationships between variables. Stat 251
Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information
More informationStatistics for Managers Using Microsoft Excel
Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap
More informationFebruary 5, 2018 START HERE. measurement scale. Means. number of means. independent measures t-test Ch dependent measures t-test Ch 16.
χ Test for Frequencies February 5, 018 Contents Chi squared (χ ) test for frequencies Example 1: left vs right handers in our class The χ distribution One or two tailed? Example : Birthdays by month Using
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationDISTRIBUTIONS USED IN STATISTICAL WORK
DISTRIBUTIONS USED IN STATISTICAL WORK In one of the classic introductory statistics books used in Education and Psychology (Glass and Stanley, 1970, Prentice-Hall) there was an excellent chapter on different
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationFormulas and Tables by Mario F. Triola
Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationHypothesis Testing: Chi-Square Test 1
Hypothesis Testing: Chi-Square Test 1 November 9, 2017 1 HMS, 2017, v1.0 Chapter References Diez: Chapter 6.3 Navidi, Chapter 6.10 Chapter References 2 Chi-square Distributions Let X 1, X 2,... X n be
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationSection VII. Chi-square test for comparing proportions and frequencies. F test for means
Section VII Chi-square test for comparing proportions and frequencies F test for means 0 proportions: chi-square test Z test for comparing proportions between two independent groups Z = P 1 P 2 SE d SE
More informationSummary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationχ 2 Test for Frequencies January 15, 2019 Contents
χ 2 Test for Frequencies January 15, 2019 Contents Chi squared (χ 2 ) test for frequencies Example 1: left vs right handers in our class The χ 2 distribution One or two tailed? Example 2: Birthdays by
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationLab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d
BIOS 4120: Introduction to Biostatistics Breheny Lab #11 We will explore observational studies in today s lab and review how to make inferences on contingency tables. We will only use 2x2 tables for today
More informationSTAT 328 (Statistical Packages)
Department of Statistics and Operations Research College of Science King Saud University Exercises STAT 328 (Statistical Packages) nashmiah r.alshammari ^-^ Excel and Minitab - 1 - Write the commands of
More informationChi-squared (χ 2 ) (1.10.5) and F-tests (9.5.2) for the variance of a normal distribution ( )
Chi-squared (χ ) (1.10.5) and F-tests (9.5.) for the variance of a normal distribution χ tests for goodness of fit and indepdendence (3.5.4 3.5.5) Prof. Tesler Math 83 Fall 016 Prof. Tesler χ and F tests
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationEpidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationEcon 325: Introduction to Empirical Economics
Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population
More information11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should
11 CHI-SQUARED Chapter 11 Chi-squared Objectives After studying this chapter you should be able to use the χ 2 distribution to test if a set of observations fits an appropriate model; know how to calculate
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationAnalysis of 2x2 Cross-Over Designs using T-Tests
Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationTHE PEARSON CORRELATION COEFFICIENT
CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There
More informationChapter 10. Discrete Data Analysis
Chapter 1. Discrete Data Analysis 1.1 Inferences on a Population Proportion 1. Comparing Two Population Proportions 1.3 Goodness of Fit Tests for One-Way Contingency Tables 1.4 Testing for Independence
More informationWhat Is ANOVA? Comparing Groups. One-way ANOVA. One way ANOVA (the F ratio test)
What Is ANOVA? One-way ANOVA ANOVA ANalysis Of VAriance ANOVA compares the means of several groups. The groups are sometimes called "treatments" First textbook presentation in 95. Group Group σ µ µ σ µ
More informationPsych Jan. 5, 2005
Psych 124 1 Wee 1: Introductory Notes on Variables and Probability Distributions (1/5/05) (Reading: Aron & Aron, Chaps. 1, 14, and this Handout.) All handouts are available outside Mija s office. Lecture
More informationLecture 21 Comparing Counts - Chi-square test
Lecture 21 Comparing Counts - Chi-square test Thais Paiva STA 111 - Summer 2013 Term II August 5, 2013 1 / 20 Thais Paiva STA 111 - Summer 2013 Term II Lecture 21, 08/05/2013 Lecture Plan 1 Goodness of
More informationChapter 10. Prof. Tesler. Math 186 Winter χ 2 tests for goodness of fit and independence
Chapter 10 χ 2 tests for goodness of fit and independence Prof. Tesler Math 186 Winter 2018 Prof. Tesler Ch. 10: χ 2 goodness of fit tests Math 186 / Winter 2018 1 / 26 Multinomial test Consider a k-sided
More informationChapter 10 Nonlinear Models
Chapter 10 Nonlinear Models Nonlinear models can be classified into two categories. In the first category are models that are nonlinear in the variables, but still linear in terms of the unknown parameters.
More informationSTAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015
STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis
More informationData analysis and Geostatistics - lecture VII
Data analysis and Geostatistics - lecture VII t-tests, ANOVA and goodness-of-fit Statistical testing - significance of r Testing the significance of the correlation coefficient: t = r n - 2 1 - r 2 with
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationData analysis and Geostatistics - lecture VI
Data analysis and Geostatistics - lecture VI Statistical testing with population distributions Statistical testing - the steps 1. Define a hypothesis to test in statistics only a hypothesis rejection is
More informationANOVA (Analysis of Variance) output RLS 11/20/2016
ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.
More informationFormulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc.
Formulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc. Ch. 2: Descriptive Statistics x Sf. x x Sf Mean S(x 2 x) 2 s 2
More information(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.
FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationOrdinal Variables in 2 way Tables
Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables
More informationHotelling s One- Sample T2
Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response
More informationMultiple Sample Categorical Data
Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationPart 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2
Problem.) I will break this into two parts: () Proving w (m) = p( x (m) X i = x i, X j = x j, p ij = p i p j ). In other words, the probability of a specific table in T x given the row and column counts
More informationCh. 7. One sample hypothesis tests for µ and σ
Ch. 7. One sample hypothesis tests for µ and σ Prof. Tesler Math 18 Winter 2019 Prof. Tesler Ch. 7: One sample hypoth. tests for µ, σ Math 18 / Winter 2019 1 / 23 Introduction Data Consider the SAT math
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationPhysics 509: Non-Parametric Statistics and Correlation Testing
Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests
More informationRon Heck, Fall Week 3: Notes Building a Two-Level Model
Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level
More informationSections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal
More information