CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

Similar documents
ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Statistics for Economics & Business

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Economics 130. Lecture 4 Simple Linear Regression Continued

As is less than , there is insufficient evidence to reject H 0 at the 5% level. The data may be modelled by Po(2).

Modeling and Simulation NETW 707

Statistics for Business and Economics

Statistics II Final Exam 26/6/18

Basic Business Statistics, 10/e

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Scatter Plot x

x = , so that calculated

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 15 Student Lecture Notes 15-1

Chapter 11: Simple Linear Regression and Correlation

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

28. SIMPLE LINEAR REGRESSION III

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

18. SIMPLE LINEAR REGRESSION III

x i1 =1 for all i (the constant ).

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

STATISTICS QUESTIONS. Step by Step Solutions.

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Topic- 11 The Analysis of Variance

Chapter 13: Multiple Regression

Lecture 4 Hypothesis Testing

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

/ n ) are compared. The logic is: if the two

Chapter 3 Describing Data Using Numerical Measures

First Year Examination Department of Statistics, University of Florida

Lecture 6: Introduction to Linear Regression

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Statistics Chapter 4

Comparison of Regression Lines

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Learning Objectives for Chapter 11

F statistic = s2 1 s 2 ( F for Fisher )

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

STAT 3008 Applied Regression Analysis

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Basic Statistical Analysis and Yield Calculations

Joint Statistical Meetings - Biopharmaceutical Section

Goodness of fit and Wilks theorem

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Soc 3811 Basic Social Statistics Third Midterm Exam Spring 2010

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

CS-433: Simulation and Modeling Modeling and Probability Review

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture 6 More on Complete Randomized Block Design (RBD)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Chapter 12 Analysis of Covariance

Chapter 14 Simple Linear Regression

CHAPTER 8. Exercise Solutions

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

January Examinations 2015

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Negative Binomial Regression

Diagnostics in Poisson Regression. Models - Residual Analysis

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

Chapter 8 Indicator Variables

Chapter 5 Multilevel Models

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

4.3 Poisson Regression

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

STAT 511 FINAL EXAM NAME Spring 2001

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture 20: Hypothesis testing

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

U-Pb Geochronology Practical: Background

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

a. (All your answers should be in the letter!

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Cathy Walker March 5, 2010

Properties of Least Squares

Exam. Econometrics - Exam 1

Methods in Epidemiology. Medical statistics 02/11/2014

Hydrological statistics. Hydrological statistics and extremes

Introduction to Regression

Using Multivariate Rank Sum Tests to Evaluate Effectiveness of Computer Applications in Teaching Business Statistics

Transcription:

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE Expected Outcomes Able to test the goodness of ft for categorcal data. Able to test whether the categorcal data ft to the certan dstrbuton such as Bnomal, Normal and Posson. Able to use a contngency table to test for ndependence and homogenety proportons. PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

Contents 6.1 Goodness of Ft Test 6.1.1 Goodness of Ft Test for Categorcal Data 6.1. Fttng of the Dstrbuton 6. Contngency Table 6..1 Testng for Two Varables between Independence 6.. Test of Homogenety Proportons

6.1 GOODNESS OF FIT TEST When to use Ch-Square Dstrbuton? 1. Fnd confdence Interval for a varance or standard devaton. Test a hypothess about a sngle varance or standard devaton 3. Tests concernng frequency dstrbutons for categorcal data (Goodness of Ft) 4. Tests concernng probablty dstrbutons (Goodness of Ft) 5. Test the Independence of two varables (Contngency Table) 6. Test the homogenety of proportons (Contngency Table)

When to use Goodness of ft test? 1. To compare between observed and expected frequences for categorcal data. Example: To meet customer demands, a manufacturer of runnng shoes may wsh to see whether buyers show a preference for a specfc style. If there were no preference, one would expect each style to be selected wth equal frequency.. When you have some practcal data and you want to know how well a partcular statstcal dstrbuton (such as posson, bnomal or normal models) ft the data. Example: A researcher wsh to test whether the number of chldren n a famly follows a Posson dstrbuton.

6.1.1 GOODNESS OF FIT TEST FOR CATEGORICAL DATA Hypothess Null and Alternatve H0 : There s no dfference or no change or no preference H1 : There s a dfference or change or preference Or H0 : State the clam of the categorcal dstrbuton H1 : The categorcal dstrbuton s not the same as stated n H0. Example: H0: Buyers show no preference for a specfc style. H1: Buyers show a preference for a specfc style.

Assumptons/Condtons 1. The data are obtaned from a random sample.. The varable under study s categorcal data. 3. The expected frequency for each category must be at least 5. If the expected frequency s less than 5, combne the adjacent category.

The Test Statstcs Where and O E k test, 1 E O = observed frequency for the category E = expected frequency for the category k = the number of categores degrees of freedom, ν = k 1 E np where P s a probabltyfor 1,,..., k

Procedures 1. State the hypothess and dentfy the clam.. Compute the test statstcs value. O E 3. Fnd the crtcal value. The test s always rghttaled snce O E are square and always postve. test k 1 E 4. Make the decson Reject Ho f. test, k 1 5. Draw a concluson to reject or accept the clam.

Why ths test s called goodness of ft? If the graph between observed values and expected values s ftted, one can see whether the values are close together or far apart. When observed values and expected values are close together: the ch-square test value wll be small. Decson must be not reject H0 (accept H0). Hence there s a good ft. When observed values and expected values are far apart: the ch-square test value wll be large. Decson must be reject H0 (accept H1). Hence there s a not a good ft.

Example 1: GoF for Categorcal Data A market analyst whshed to see whether consumers have any preference among fve flavors of a new frut soda. A sample of 100 people provded these data. Cherry Strawberry Orange Lme Grape 3 8 16 14 10 Is there enough evdence to reject the clam that there s no preference n the selecton of frut soda flavors at 0.05 sgnfcance level?

Example 1: soluton H 0 : There s no preference n the selecton of frut soda flavours (clam) H : There s preference n the selecton of frut soda flavours 1 E np 1 100 5 0 Frequency Cherry Strawberry Orange Lme Grape Observed ( O ) 3 8 16 14 10 Expected ( E ) 0 0 0 0 0

Example 1: soluton test k 1 O E E 3 0 8 0 16 0 14 0 10 0 0 0 0 0 0 18.0 crtcal, k 1 0.05,4 9.4877 Snce test 18.0 0.05,4 9.4877, then we reject H 0. At 0.05, there s enough evdence to reject the clam that there s no preference n the selecton of frut soda flavours.

6.1. FITTING OF DISTRIBUTION Hypothess Null and Alternatve H0: The populaton of a set of observed data comes from a specfc dstrbuton (Posson/Bnomal/Normal). H1: The populaton of a set of observed data does not comes from a specfc dstrbuton (Posson/Bnomal/Normal). Example: H0: The number of chldren n a famly follows a Posson dstrbuton H1: The number of chldren n a famly does not follows a Posson dstrbuton

NOTES 1. The expected frequency for each category must be at least 5. If the expected frequency s less than 5, combne the adjacent category.. Reject H0 f test, k p 1 where p s the number of parameters n the hypotheszed dstrbuton estmated by sample statstcs.

Procedures 1. State the hypothess and dentfy the clam. k O E test. Compute the test value 1 E. If the expected frequency s less than 5, t should be combned wth the expected frequency n the adjacent class nterval. 3. Fnd the crtcal value. The test s always rght-taled snce O E are square and always postve. 4. Make the decson reject Ho f test, k p 1 where p s the number of parameters n the hypotheszed dstrbuton estmated by sample statstcs. 5. Draw a concluson to reject or accept the clam.

Example : GoF for Fttng Dstrbuton The number of defects n the prnted crcut boards s hypotheszed to follow a Posson dstrbuton. A random sample of 60 prnted boards has been collected and the followng numbers of defects observed. Number of defect Observed frequency 0 3 1 15 9 3 4 Test the hypothess that number of defects n the prnted crcut boards s follows a Posson dstrbuton at α = 0.05.

Example : soluton H 0: The number of defects n prnted crcut boards follows a Posson dstrbuton. H : The number of defects n prnted crcut boards does not follow a Posson dstrbuton. 1 For Posson dstrbuton, fnd the average value, 0 3 1 15 9 3 4 0.75 60 We estmated the value of λ, thus parameter, p = 1. No. of defects 0 1 3 x e E np x! 0.75 0 e (0.75) P1 P( X 0) 0.474 E1 60(0.474) 8.344 0! O P P( X x) 0.75 e 1 15 0.75 1 P P( X 1) 0.3543 E 60(0.3543) 1.58 1! 0.75 e 3 9 0.75 3 (or more) 4 4 P3 P( X ) 0.139 E3 60(0.139) 7.974! P4 P( X 3) 1 [ P1 P P3 ] E4 60(0.0404).44 1 0.474 0.3543 0.139 0.0404

Example : soluton No. of defects Observed frequences O Expected frequences E 0 3 8.344 1 15 1.58 9 7.974 3 (or more) 4.44 E 5. Combne the adjacent category and reconstruct the table No. of defects Observed frequences O Expected frequences E 0 3 8.344 1 15 1.58 (or more) 13 10.398

Example : soluton No. of defects Observed frequences O Expected frequences E 0 3 8.344 1 15 1.58 (or more) 13 10.398 test k 1 O E E 3 8.344 15 1.58 13 10.398 8.344 1.58 10.398.965 crtcal, k p1 0.05,311 0.05,1 3.8415, then we do not reject H 0. Snce test.965 0.05,1 3.8415 At 0.05, there s suffcent evdence to conclude that the number of defects n prnted crcut boards follows a Posson dstrbuton.

Example 3 A farmer kept a record of the number of hefer calves born to each of hs cows durng the frst fve years. The results are summarzed below. No of hefers 0 1 3 4 5 No of cows 4 19 41 5 6 8 Test at the 5% level of sgnfcance, whether these data adequate for bnomal dstrbuton or not wth parameter n = 5 and p = 0.5. The parameters n = 5 and p = 0.5 are gven thus parameter, p = 0.

Example 3: soluton H0 The numbers of hefer calves born to each of hs cows are adequate for bnomal dstrbuton. H 1 The numbers of hefer calves born to each of hs cows are not adequate for bnomal dstrbuton. n x nx Probablty, P = PX x p 1 p Expected frequences, E np x 5 0 P 5 1 P X 0 0.5 0.5 0.0313 0 E1 150 0.0313 4.695 P 5 1 P X 1 0.5 0.5 4 0.1563 1 E 150 0.1563 3.445 5 P 3 3 P X 0.5 0.5 0.315 E3 150 0.315 46.875 P 4 P X 3 P E4 5 P X 4 P E5 6 P X 5 E6

Example 3: soluton Observed frequences O Expected frequences E 4 4.695 19 3.445 41 41 46.875 46.875 5 5 46.875 46.875 6 3.445 8 4.695 test 0.05, k p1 Decson:

Example 4 The sugar concentratons n apple juce measured at 0 C were reported n artcle of Food Testng & Analyss for 50 readngs n the frequency dstrbuton table below. Class nterval (sugar concentraton) 1.0-1. 1.3-1.5 1.6-1.8 1.9-.1 Observed frequency 10 15 15 10 At the.5% level of sgnfcance, s there any evdence to support the assumpton that the sugar concentraton s normally dstrbuted when μ = 1.5 and σ = 0.5? The parameters μ = 1.5 and σ = 0.5 are gven thus parameter, p = 0.

Example 4: soluton H 0 H 1 : The sugar concentraton n clear apple juce s normally dstrbuted. : The sugar concentraton n clear apple juce s not normally dstrbuted. 0.95 1.5 1.5 1.5 P0.95 X 1.5 P Z 0.5 0.5 P 1.1 Z 0.5 0.178 1.5 1.5 1.55 1.5 P1.5 X 1.55 P Z 0.5 0.5 P 0.5 Z 0.1 1.55 1.5 1.85 1.5 P1.55 X 1.85 P Z 0.5 0.5 P 0.1 Z 0.7 1.85 1.5.15 1.5 P1.85 X.15 P Z 0.5 0.5 P 0.7 Z 1.3

Example 4: soluton Class nterval Observed frequency Class boundares Expected frequency 1.0 1. 10 0.95 1.5 50(0.178) 8. 64 1.3 1.5 15 1.5 1.55 50(0.313) 11. 565 1.6 1.8 15 1.55 1.85 50(0.18) 10. 91 1.9.1 10 1.85.15 50(0.145) 7. 6 Snce ( test 3.8017) < ( 0.05,3 9.3484), then we do not reject H 0 At 0. 05, there s enough evdence to conclude that the sugar concentraton n apple juce s normally dstrbuted.

6. CONTINGENCY TABLE The contngency table s called an r x c contngency table (r categores for the row varable and c categores for the column varable). We are nterested to fnd out whether the row varable s ndependent of the column varable. Row varable Column varable, j O11 O1 O O 1 n 1. n. n.1 n. n..

The Test Statstcs where O j Ej r c test ~ v E 1 j1 j O j = the observed frequency n cell (, j ) E j = the expected frequency n cell (, j ) = level on the frst classfcaton method (row varable) j = level on the second classfcaton method (column varable) degree of freedom, 1 1 v r c

The Expected Frequency Row varable, Column varable, j O11 O1 O O 1 n 1. n. n.1 n. n.. E j n. x n.. n. j

6..1 THE CHI-SQUARE INDEPENDENCE TEST To test the ndependence of two varables Hypothess Null and Alternatve H0 : The row and column varables are ndependent/not related wth each other (x has no relatonshp wth y) H1 : The row and column varables are dependent/ related wth each other (x has relatonshp wth y)

Procedures 1. State the hypothess and dentfy the clam. O E. Compute the test value test.. 3. Fnd the crtcal value,( r1)( c1). 4. Make the decson reject Ho test,( r1)( c1). 5. Draw a concluson to reject or accept the clam. r c 1 j1 j E j j

Example 5: Ch-Square Independence Test The data below shows the number of nsomna patent accordng to ther smokng habt n Malaysa. Smokng Habt Not smokng Insomna 0 40 Not nsomna 10 80 At α = 0.01, Can we say that nsomna s ndependent wth smokng habt?

Example 5: soluton H 0 : Insomna s ndependent of smokng habt (clam) H : Insomna s dependent of smokng habt 1 Smokng Habt Not smokng Insomna 0 40 n 1. 60 Not nsomna 10 80 n. 90 n. n 30 n 10 n 150 j.1. n...

Example 5: soluton O j E j n.. j n.. n ( O E ) j E j j 6030 O11 0 E11 1 150 O E1 1 40 60 10 48 150 90 30 O1 10 E1 18 150 O E 80 crtcal = = 0.01,(1)(1) 90 10 7 150 0.01,1 = 6.6349 Snce test 11.1111 0.01,1 6.6349 (0 1) 1 (40 48) 48 (10 18) 18 (80 7) 7 test 5.3333 1.3333 3.5556 0.8889 r c O j Ej 1 j1 j 11.1111, then we reject H 0. At 0.01, there s suffcent evdence to conclude that nsomna s not ndependent (or dependent) of smokng habt. E

6.. TEST FOR HOMOGENEITY OF PROPORTIONS Concerns the homogenety or smlarty of two or more populaton proportons wth regard to the dstrbuton of a certan characterstc. Consders the smlarty of two or more populaton proportons. The procedure s smlar to the procedure used to make a test of ndependence dscussed. Hypothess Null and Alternatve H0 : H1 : OR 1... n j for at least j H0 : All proportons are the same H1 : At least one proporton s dfferent from the others

Example 6: Homogenety Test for Proportons A researcher selected a sample of 50 senors from each of three area secondary schools and asked each students, Do you come to school on your own or sent by your parents?. The data are shown n the table. SCHOOL 1 SCHOOL SCHOOL 3 Yes 18 16 No 3 8 34 At 0.05, test the clam that the proporton of students who come to school on ther own or sent by ther parents s the same for all schools.

Example 6: soluton H 0 : All proportons are the same H : At least one proporton s dfferent from the others. 1 OR H 0 : 1 3 H : j for at least one j 1 School 1 School School 3 Yes 18 16 n 1. 56 No 3 8 34 n. 94 n. n 50 n 50 n 50 n 150 j.1..3 n...

Example 6: soluton O j E j n.. j n.. n ( O E ) j E j j 56 50 O11 18 E11 18.6667 150 56 50 O1 E1 18.6667 150 56 50 O13 16 E13 18.6667 150 94 50 O1 3 E1 31.3333 150 94 50 O 8 E 31.3333 150 94 50 O3 34 E3 31.3333 150 Snce test 1.5958 0.05, 5.9915 then do not reject H 0., (18 18.6667) 18.6667 ( 18.6667) 18.6667 (16 18.6667) 18.6667 (3 31.3333) 31.3333 (8 31.3333) 31.3333 (34 31.3333) 31.3333 test 0.038 0.595 0.3810 0.014 0.3546 0.70 r c O j Ej E 1 j1 j 1.5958 At 0.05, there s suffcent evdence to conclude that the proportons of student come to school on ther own or sent by ther parents s the same for all schools

REFERENCES 1. Montgomery D. C. & Runger G. C. 011. Appled Statstcs and Probablty for Engneers. 5 th Edton. New York: John Wley & Sons, Inc.. Walpole R.E., Myers R.H., Myers S.L. & Ye K. 011. Probablty and Statstcs for Engneers and Scentsts. 9 th Edton. New Jersey: Prentce Hall. 3. Navd W. 011. Statstcs for Engneers and Scentsts. 3 rd Edton. New York: McGraw-Hll. 4. Bluman A.G. 009. Elementary Statstcs: A Step by Step Approach. 7 th Edton. New York: McGraw Hll. 5. Trola, M.F. 006. Elementary Statstcs.10 th Edton. UK: Pearson Educaton. 6. Wess, N.A. 00. Introductory Statstcs. 6 th Edton. Unted States: Addson- Wesley. 7. Sanders D.H. & Smdth R.K. 000. Statstcs: A Frst Course. 6 th Edton. New York: McGraw-Hll. 8. Satar S. Z. et al. Appled Statstcs Module New Verson. 015. Penerbt UMP. Internal used. THE END. Thank You