CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE Expected Outcomes Able to test the goodness of ft for categorcal data. Able to test whether the categorcal data ft to the certan dstrbuton such as Bnomal, Normal and Posson. Able to use a contngency table to test for ndependence and homogenety proportons. PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI
Contents 6.1 Goodness of Ft Test 6.1.1 Goodness of Ft Test for Categorcal Data 6.1. Fttng of the Dstrbuton 6. Contngency Table 6..1 Testng for Two Varables between Independence 6.. Test of Homogenety Proportons
6.1 GOODNESS OF FIT TEST When to use Ch-Square Dstrbuton? 1. Fnd confdence Interval for a varance or standard devaton. Test a hypothess about a sngle varance or standard devaton 3. Tests concernng frequency dstrbutons for categorcal data (Goodness of Ft) 4. Tests concernng probablty dstrbutons (Goodness of Ft) 5. Test the Independence of two varables (Contngency Table) 6. Test the homogenety of proportons (Contngency Table)
When to use Goodness of ft test? 1. To compare between observed and expected frequences for categorcal data. Example: To meet customer demands, a manufacturer of runnng shoes may wsh to see whether buyers show a preference for a specfc style. If there were no preference, one would expect each style to be selected wth equal frequency.. When you have some practcal data and you want to know how well a partcular statstcal dstrbuton (such as posson, bnomal or normal models) ft the data. Example: A researcher wsh to test whether the number of chldren n a famly follows a Posson dstrbuton.
6.1.1 GOODNESS OF FIT TEST FOR CATEGORICAL DATA Hypothess Null and Alternatve H0 : There s no dfference or no change or no preference H1 : There s a dfference or change or preference Or H0 : State the clam of the categorcal dstrbuton H1 : The categorcal dstrbuton s not the same as stated n H0. Example: H0: Buyers show no preference for a specfc style. H1: Buyers show a preference for a specfc style.
Assumptons/Condtons 1. The data are obtaned from a random sample.. The varable under study s categorcal data. 3. The expected frequency for each category must be at least 5. If the expected frequency s less than 5, combne the adjacent category.
The Test Statstcs Where and O E k test, 1 E O = observed frequency for the category E = expected frequency for the category k = the number of categores degrees of freedom, ν = k 1 E np where P s a probabltyfor 1,,..., k
Procedures 1. State the hypothess and dentfy the clam.. Compute the test statstcs value. O E 3. Fnd the crtcal value. The test s always rghttaled snce O E are square and always postve. test k 1 E 4. Make the decson Reject Ho f. test, k 1 5. Draw a concluson to reject or accept the clam.
Why ths test s called goodness of ft? If the graph between observed values and expected values s ftted, one can see whether the values are close together or far apart. When observed values and expected values are close together: the ch-square test value wll be small. Decson must be not reject H0 (accept H0). Hence there s a good ft. When observed values and expected values are far apart: the ch-square test value wll be large. Decson must be reject H0 (accept H1). Hence there s a not a good ft.
Example 1: GoF for Categorcal Data A market analyst whshed to see whether consumers have any preference among fve flavors of a new frut soda. A sample of 100 people provded these data. Cherry Strawberry Orange Lme Grape 3 8 16 14 10 Is there enough evdence to reject the clam that there s no preference n the selecton of frut soda flavors at 0.05 sgnfcance level?
Example 1: soluton H 0 : There s no preference n the selecton of frut soda flavours (clam) H : There s preference n the selecton of frut soda flavours 1 E np 1 100 5 0 Frequency Cherry Strawberry Orange Lme Grape Observed ( O ) 3 8 16 14 10 Expected ( E ) 0 0 0 0 0
Example 1: soluton test k 1 O E E 3 0 8 0 16 0 14 0 10 0 0 0 0 0 0 18.0 crtcal, k 1 0.05,4 9.4877 Snce test 18.0 0.05,4 9.4877, then we reject H 0. At 0.05, there s enough evdence to reject the clam that there s no preference n the selecton of frut soda flavours.
6.1. FITTING OF DISTRIBUTION Hypothess Null and Alternatve H0: The populaton of a set of observed data comes from a specfc dstrbuton (Posson/Bnomal/Normal). H1: The populaton of a set of observed data does not comes from a specfc dstrbuton (Posson/Bnomal/Normal). Example: H0: The number of chldren n a famly follows a Posson dstrbuton H1: The number of chldren n a famly does not follows a Posson dstrbuton
NOTES 1. The expected frequency for each category must be at least 5. If the expected frequency s less than 5, combne the adjacent category.. Reject H0 f test, k p 1 where p s the number of parameters n the hypotheszed dstrbuton estmated by sample statstcs.
Procedures 1. State the hypothess and dentfy the clam. k O E test. Compute the test value 1 E. If the expected frequency s less than 5, t should be combned wth the expected frequency n the adjacent class nterval. 3. Fnd the crtcal value. The test s always rght-taled snce O E are square and always postve. 4. Make the decson reject Ho f test, k p 1 where p s the number of parameters n the hypotheszed dstrbuton estmated by sample statstcs. 5. Draw a concluson to reject or accept the clam.
Example : GoF for Fttng Dstrbuton The number of defects n the prnted crcut boards s hypotheszed to follow a Posson dstrbuton. A random sample of 60 prnted boards has been collected and the followng numbers of defects observed. Number of defect Observed frequency 0 3 1 15 9 3 4 Test the hypothess that number of defects n the prnted crcut boards s follows a Posson dstrbuton at α = 0.05.
Example : soluton H 0: The number of defects n prnted crcut boards follows a Posson dstrbuton. H : The number of defects n prnted crcut boards does not follow a Posson dstrbuton. 1 For Posson dstrbuton, fnd the average value, 0 3 1 15 9 3 4 0.75 60 We estmated the value of λ, thus parameter, p = 1. No. of defects 0 1 3 x e E np x! 0.75 0 e (0.75) P1 P( X 0) 0.474 E1 60(0.474) 8.344 0! O P P( X x) 0.75 e 1 15 0.75 1 P P( X 1) 0.3543 E 60(0.3543) 1.58 1! 0.75 e 3 9 0.75 3 (or more) 4 4 P3 P( X ) 0.139 E3 60(0.139) 7.974! P4 P( X 3) 1 [ P1 P P3 ] E4 60(0.0404).44 1 0.474 0.3543 0.139 0.0404
Example : soluton No. of defects Observed frequences O Expected frequences E 0 3 8.344 1 15 1.58 9 7.974 3 (or more) 4.44 E 5. Combne the adjacent category and reconstruct the table No. of defects Observed frequences O Expected frequences E 0 3 8.344 1 15 1.58 (or more) 13 10.398
Example : soluton No. of defects Observed frequences O Expected frequences E 0 3 8.344 1 15 1.58 (or more) 13 10.398 test k 1 O E E 3 8.344 15 1.58 13 10.398 8.344 1.58 10.398.965 crtcal, k p1 0.05,311 0.05,1 3.8415, then we do not reject H 0. Snce test.965 0.05,1 3.8415 At 0.05, there s suffcent evdence to conclude that the number of defects n prnted crcut boards follows a Posson dstrbuton.
Example 3 A farmer kept a record of the number of hefer calves born to each of hs cows durng the frst fve years. The results are summarzed below. No of hefers 0 1 3 4 5 No of cows 4 19 41 5 6 8 Test at the 5% level of sgnfcance, whether these data adequate for bnomal dstrbuton or not wth parameter n = 5 and p = 0.5. The parameters n = 5 and p = 0.5 are gven thus parameter, p = 0.
Example 3: soluton H0 The numbers of hefer calves born to each of hs cows are adequate for bnomal dstrbuton. H 1 The numbers of hefer calves born to each of hs cows are not adequate for bnomal dstrbuton. n x nx Probablty, P = PX x p 1 p Expected frequences, E np x 5 0 P 5 1 P X 0 0.5 0.5 0.0313 0 E1 150 0.0313 4.695 P 5 1 P X 1 0.5 0.5 4 0.1563 1 E 150 0.1563 3.445 5 P 3 3 P X 0.5 0.5 0.315 E3 150 0.315 46.875 P 4 P X 3 P E4 5 P X 4 P E5 6 P X 5 E6
Example 3: soluton Observed frequences O Expected frequences E 4 4.695 19 3.445 41 41 46.875 46.875 5 5 46.875 46.875 6 3.445 8 4.695 test 0.05, k p1 Decson:
Example 4 The sugar concentratons n apple juce measured at 0 C were reported n artcle of Food Testng & Analyss for 50 readngs n the frequency dstrbuton table below. Class nterval (sugar concentraton) 1.0-1. 1.3-1.5 1.6-1.8 1.9-.1 Observed frequency 10 15 15 10 At the.5% level of sgnfcance, s there any evdence to support the assumpton that the sugar concentraton s normally dstrbuted when μ = 1.5 and σ = 0.5? The parameters μ = 1.5 and σ = 0.5 are gven thus parameter, p = 0.
Example 4: soluton H 0 H 1 : The sugar concentraton n clear apple juce s normally dstrbuted. : The sugar concentraton n clear apple juce s not normally dstrbuted. 0.95 1.5 1.5 1.5 P0.95 X 1.5 P Z 0.5 0.5 P 1.1 Z 0.5 0.178 1.5 1.5 1.55 1.5 P1.5 X 1.55 P Z 0.5 0.5 P 0.5 Z 0.1 1.55 1.5 1.85 1.5 P1.55 X 1.85 P Z 0.5 0.5 P 0.1 Z 0.7 1.85 1.5.15 1.5 P1.85 X.15 P Z 0.5 0.5 P 0.7 Z 1.3
Example 4: soluton Class nterval Observed frequency Class boundares Expected frequency 1.0 1. 10 0.95 1.5 50(0.178) 8. 64 1.3 1.5 15 1.5 1.55 50(0.313) 11. 565 1.6 1.8 15 1.55 1.85 50(0.18) 10. 91 1.9.1 10 1.85.15 50(0.145) 7. 6 Snce ( test 3.8017) < ( 0.05,3 9.3484), then we do not reject H 0 At 0. 05, there s enough evdence to conclude that the sugar concentraton n apple juce s normally dstrbuted.
6. CONTINGENCY TABLE The contngency table s called an r x c contngency table (r categores for the row varable and c categores for the column varable). We are nterested to fnd out whether the row varable s ndependent of the column varable. Row varable Column varable, j O11 O1 O O 1 n 1. n. n.1 n. n..
The Test Statstcs where O j Ej r c test ~ v E 1 j1 j O j = the observed frequency n cell (, j ) E j = the expected frequency n cell (, j ) = level on the frst classfcaton method (row varable) j = level on the second classfcaton method (column varable) degree of freedom, 1 1 v r c
The Expected Frequency Row varable, Column varable, j O11 O1 O O 1 n 1. n. n.1 n. n.. E j n. x n.. n. j
6..1 THE CHI-SQUARE INDEPENDENCE TEST To test the ndependence of two varables Hypothess Null and Alternatve H0 : The row and column varables are ndependent/not related wth each other (x has no relatonshp wth y) H1 : The row and column varables are dependent/ related wth each other (x has relatonshp wth y)
Procedures 1. State the hypothess and dentfy the clam. O E. Compute the test value test.. 3. Fnd the crtcal value,( r1)( c1). 4. Make the decson reject Ho test,( r1)( c1). 5. Draw a concluson to reject or accept the clam. r c 1 j1 j E j j
Example 5: Ch-Square Independence Test The data below shows the number of nsomna patent accordng to ther smokng habt n Malaysa. Smokng Habt Not smokng Insomna 0 40 Not nsomna 10 80 At α = 0.01, Can we say that nsomna s ndependent wth smokng habt?
Example 5: soluton H 0 : Insomna s ndependent of smokng habt (clam) H : Insomna s dependent of smokng habt 1 Smokng Habt Not smokng Insomna 0 40 n 1. 60 Not nsomna 10 80 n. 90 n. n 30 n 10 n 150 j.1. n...
Example 5: soluton O j E j n.. j n.. n ( O E ) j E j j 6030 O11 0 E11 1 150 O E1 1 40 60 10 48 150 90 30 O1 10 E1 18 150 O E 80 crtcal = = 0.01,(1)(1) 90 10 7 150 0.01,1 = 6.6349 Snce test 11.1111 0.01,1 6.6349 (0 1) 1 (40 48) 48 (10 18) 18 (80 7) 7 test 5.3333 1.3333 3.5556 0.8889 r c O j Ej 1 j1 j 11.1111, then we reject H 0. At 0.01, there s suffcent evdence to conclude that nsomna s not ndependent (or dependent) of smokng habt. E
6.. TEST FOR HOMOGENEITY OF PROPORTIONS Concerns the homogenety or smlarty of two or more populaton proportons wth regard to the dstrbuton of a certan characterstc. Consders the smlarty of two or more populaton proportons. The procedure s smlar to the procedure used to make a test of ndependence dscussed. Hypothess Null and Alternatve H0 : H1 : OR 1... n j for at least j H0 : All proportons are the same H1 : At least one proporton s dfferent from the others
Example 6: Homogenety Test for Proportons A researcher selected a sample of 50 senors from each of three area secondary schools and asked each students, Do you come to school on your own or sent by your parents?. The data are shown n the table. SCHOOL 1 SCHOOL SCHOOL 3 Yes 18 16 No 3 8 34 At 0.05, test the clam that the proporton of students who come to school on ther own or sent by ther parents s the same for all schools.
Example 6: soluton H 0 : All proportons are the same H : At least one proporton s dfferent from the others. 1 OR H 0 : 1 3 H : j for at least one j 1 School 1 School School 3 Yes 18 16 n 1. 56 No 3 8 34 n. 94 n. n 50 n 50 n 50 n 150 j.1..3 n...
Example 6: soluton O j E j n.. j n.. n ( O E ) j E j j 56 50 O11 18 E11 18.6667 150 56 50 O1 E1 18.6667 150 56 50 O13 16 E13 18.6667 150 94 50 O1 3 E1 31.3333 150 94 50 O 8 E 31.3333 150 94 50 O3 34 E3 31.3333 150 Snce test 1.5958 0.05, 5.9915 then do not reject H 0., (18 18.6667) 18.6667 ( 18.6667) 18.6667 (16 18.6667) 18.6667 (3 31.3333) 31.3333 (8 31.3333) 31.3333 (34 31.3333) 31.3333 test 0.038 0.595 0.3810 0.014 0.3546 0.70 r c O j Ej E 1 j1 j 1.5958 At 0.05, there s suffcent evdence to conclude that the proportons of student come to school on ther own or sent by ther parents s the same for all schools
REFERENCES 1. Montgomery D. C. & Runger G. C. 011. Appled Statstcs and Probablty for Engneers. 5 th Edton. New York: John Wley & Sons, Inc.. Walpole R.E., Myers R.H., Myers S.L. & Ye K. 011. Probablty and Statstcs for Engneers and Scentsts. 9 th Edton. New Jersey: Prentce Hall. 3. Navd W. 011. Statstcs for Engneers and Scentsts. 3 rd Edton. New York: McGraw-Hll. 4. Bluman A.G. 009. Elementary Statstcs: A Step by Step Approach. 7 th Edton. New York: McGraw Hll. 5. Trola, M.F. 006. Elementary Statstcs.10 th Edton. UK: Pearson Educaton. 6. Wess, N.A. 00. Introductory Statstcs. 6 th Edton. Unted States: Addson- Wesley. 7. Sanders D.H. & Smdth R.K. 000. Statstcs: A Frst Course. 6 th Edton. New York: McGraw-Hll. 8. Satar S. Z. et al. Appled Statstcs Module New Verson. 015. Penerbt UMP. Internal used. THE END. Thank You