There are statistical tests that compare prediction of a model with reality and measures how significant the difference.

Size: px

Start display at page:

Download "There are statistical tests that compare prediction of a model with reality and measures how significant the difference."

Bertram Kennedy
5 years ago
Views:

1 Statistical Methods in Business Lecture 11. Chi Square, χ 2, Goodness-of-Fit Test There are statistical tests that compare prediction of a model with reality and measures how significant the difference. Also, there are statistical methods that are capable of operating under any probability law, they are known as non-parametric methods. The chi-square goodness-of-fit test is a non-parametric test to measure the difference between a model prediction versus reality. It works reliably when we have large samples (more than 20 observations). Let a population have different categories, a.k.a. classes, of a characteristic. The general structure of the chi-square test would be a follows: H 0 : There is no significant difference between the model predictions and reality. H 1 : There is a significant difference between the model predictions and reality. Level of significance: Test statistic: = Σ : ; (=> :? ) B C Where i denotes the index that points at the category-i, fo G denotes the observed frequency of the category-i, i.e. the number of observations in the category-i, and fe G denotes the expected frequency of the category-i, i.e. number of observations expected to be in the category-i, predicted by the model. The expected frequency for a category must be five or greater. Otherwise chi-square test would become unreliable. The test statistic is subject to a chi-square probability law, identified by its degrees of freedom, df (which is a function of the number of categories), denoted by χ I=. Hence, the critical level to compare the test statistic with is χ JKL5LJ6M = χ ;I= i.e. χ JKL5LJ6M is the value that generates amount of right tail area under the probability density function of χ I=. Decision Rule: If > χ JKL5LJ6M, then reject H 0. 1 P age

2 or, compute the p-value: p-value = P (χ > ) and write the decision rule: if p-value <, then reject H 0. - which indicates that the H 0 has a very small chance to be the true statement. Business applications of the chi-square test are many, however, the most common applications are as following examples: Example 1) Uniform distribution ( Fairness ): Model: Assume we have a fair die. Experiment: Roll the die 120 times. Outcome: Each category is identified by a face number. Category: 1, 2, 3, 4, 5, 6. fo G : 13, 33, 14,, 36, 1. Question: Is this die fair? Investigate at the 95% confidence level. H 0 : The die is fair. H 1 : The die is not fair. = 0.05 Test statistic: The model assumes that all categories have equal number of elements. 5P56M QRSTUK PV PT4UKW65LPQ4 Thus, fe G =. QRSTUK PV J65UXPKLU4 i.e. fe G = 10 Y = 20, for all i. Let the number of categories be k = 6. We have df = k 1 = 5. Then, = Y (=> :? ) B = Ga1 = (1b?0)B (1c?0)B 0. 2 P age

3 And, p-value = p (χ > 34.40), subject to a chi-square probability law with 5 degrees of freedom. p-value = Decision Rule: If p-value <, then reject H 0. Or, If > χ JKL5LJ6M, then reject H 0. Decision: Since p-value <, we reject H 0 at 5% level of significance. We are 95% confident that this die is not a fair die (based on this evidence). Example 2) Grades on the Curve. In this sentence, the curve refers to the Gaussian (normal) probability law. MODEL: Grades are distributed according to a NORMAL PROBABILITY DISTRIBUTION Assume the performance of a student is measured by a numerical grade, representing the degree of learning, from 0 to 100. Higher numbers indicate higher degree of learning. LETTER GRADE CATEGORIES: A, B+, B, C+, C, D, F. PERFORMANCE MEASURE: [90,100], [85,90), [80, 85), [5, 80), [60, 5), [50,60), [0,50) (Numerical Categorization) EXPERIMENT: In a class of 100 undergraduate students, the final performance measures are: LETTER GRADE CATEGORY: A B+ B C+ C D F fo G : 10, 10, 16, 1, 2, 13, Assume that the average grade is 80 out of 100, and the standard deviation of the grade is 5 out 100, based on all students. MODEL: X ~Ɲ (80, 5 ). 3 P age

4 Where the random variable X denotes the performance measure, grade. EXPECTED FREQUENCIES FOR LETTER GRADE CATEGORIES: fe 6 = P (90 X 100) x (NUMBER OF STUDENTS), fe Tp = P (85 X < 90) x (NUMBER OF STUDENTS), fe T = P (80 X < 85) x (NUMBER OF STUDENTS), fe Jp = P (5 X < 80) x (NUMBER OF STUDENTS), fe J = P (60 X < 5) x (NUMBER OF STUDENTS), fe r = P (50 X < 60) x (NUMBER OF STUDENTS), fe V = P (0 X < 50) x (NUMBER OF STUDENTS). P (0 X < 50) = x P (50 X < 60) = x 10-5 P (60 X < 5) = P (5 X < 80) = P (80 X < 85) = P (85 X < 90) = P (90 X 100) = Also: P (- < X < 0 ) = x P (100 < X < ) = ( ) = x 10-5 fe 6 = , fe Tp = , fe T = , fe Jp = , fe J = , fe r = , fe V = , Hence, we have the goodness of fit test: H 0 : Grades are distributed on the curve. H 1 : Grades are not distributed on the curve. Level of significance: = 0.05 Test statistic: = c (=> :? ) B Ga1 = P age

5 CRITICAL VALUE: χ JKL5LJ6M = χ 0.0t,Y = p-value = P (χ > ) 0 Decision Rule: If p-value <, then reject H 0. Or, if > χ JKL5LJ6M, then reject H 0. Decision: Since p-value <, we reject H 0 at 5% level of significance. Based on this evidence, we are 95% confident that these grades are not distributed on the curve. Example 3) Comparing probability of successes: Assume that there are three different banks in our neighborhood. Let the customer satisfaction be the success event for a bank. We want to compare the proportion of satisfied customers for these three banks, with 95% confidence. Let the probability of success (proportion of the satisfied customers) be π w for the bank j; j=1,2,3. We collect the following data Bank -1 Bank-2 Bank-3 Total SUCCESS FAILURE TOTAL This is a contingency table, informing us about two events happening simultaneously; Event-1= Bank patronage and Event-2= State of satisfaction. 5 P age

6 Each combination of these two events is a joint event, and represents one category. For example, there are 128 number of satisfied customers of the Bank-1 and 66 unsatisfied customers of the Bank-3. The margins of this contingency table eliminates one factor information. For example, there are 513 total number of satisfied customers for all these three banks. The GRAND TOTAL is the sum of marginal numbers and it indicates the total number of customers, in our example the grand total is 00 customers. We have fo 1 = 128, fo = 199, fo b = 186, fo x = 88, fo t = 33, fo Y = 66. We can compute the expected frequency of each category by using the contingency table information as follows: fe G = ; (K>y 5>z{ ) : } (J> ~ 5>z{ ) : C XK6Qr 5P56M fe 1 = (t1b)(1y) fe x = (1 c)(1y), fe = (t1b)(b), fe b = (t1b)(t), fe t = (1 c)(b) (1 c)(t), fe Y = These computations are generated by the independence assumption of two factors; customer satisfaction and bank selection. The contingency table data has two rows, r=2, denoting customer satisfaction factor, and three columns, representing the bank selection factor, c=3, and the test statistic, ~ χ (?1)(ƒ?1). Let π w be the probability of SUCCESS for the bank-j; j=1,2,3. H 0 : π 1 = π = π b. H 1 : Not all π w are equal. Level of significance: = P age

7 Test Statistic: = Y ; (=> :? ) B Ga1 C = χ JKL5LJ6M = χ 0.0t ; (2-1) (3-1) = χ 0.0t ; 2 = p-value = P (χ > ) = 1.84 x 10-9 Decision Rule: If p-value <, then reject H 0. Or, if > χ JKL5LJ6M, then reject H 0. Decisions: Since X 4565 > X JKL5LJ6M, we reject H 0 at 5% level of significance. Based on this evidence, we are 95% confident that not all three banks have the same proportion of satisfied customers. Marascuilo Procedure: We can compare and identify significantly different probability of success for each pair of banks. Let (j, j ) identify a pair of (Bank-j, Bank- j ), for all j j. Also, let p j be the sample proportion of success for a Bank-j. Decision Criterion: If p w - p w > critical range (j, j ), then π w and π w are significantly different. Where, critical range (j, j ) = χ JKL5LJ6M 1p š + (1p ) n w is the number of customers for the bank-j, j=1,2,3, and n w is the number of customers for the bank-j, j =1,2,3. χ JKL5LJ6M = = œχ ; (r 1)(c 1) P age

8 p 1 = = 1 1Y, n 1 = 216 p = = 1ŸŸ b, n = 232 p b = = 1 Y t, n b = 252 p 1 - p = p 1 - p b = p - p b = Critical range (1,2) = χ JKL5LJ6M (1p ) + B(1p B ) B Critical range (1,3) = χ JKL5LJ6M (1p ) + (1p ) Critical range (2,3) = χ JKL5LJ6M B (1p B ) + (1p ) B Hence, π 1 and π are significantly different, but π 1 and π b, and π and π b are not significantly different. Example 4) Independence Test: We can continue with the previous example and investigate whether the customer satisfaction factor and the bank selection factor are independent. H 0 : There is no significant relationship between the customer satisfaction factor and the bank selection factor. H 1 : There is significant relationship between the customer satisfaction factor and the bank selection factor. Level of significance: = P age

9 Contingency Table for Dissatisfaction: Reason Bank-1 Bank-2 Bank-3 TOTAL Price Location Service Other TOTAL (r=4, c=3) degrees of freedom = (r-1) (c-1) = 6. Test Statistic: = 1 [ (=> :? ) B Ga1 ] = χ JKL5LJ6M = χ 0.0t,Y = Decision Rule: If > χ JKL5LJ6M, then reject H 0. Or, if p-value <, then reject H 0. Decision: Since > χ JKL5LJ6M, we reject H 0 at 5% level of significance. Based on this evidence, we are 95% confident that the customer satisfaction rate and the bank selection are not independent factors. The chi-square goodness-of-fit test provides an easy to use statistical method for variety of different cases, in order to compare the distance between predictions of a model versus reality. Yet, there are important shortcomings of this method, it is an asymptotic method, indicating that the test statistic is subject to a chi-square probability law in the convergence of distribution sense, thus a large sample is required. Also, the expected frequency of a category should be at least five, otherwise the chi-square test would be unstable and unreliable. The trade-off between simplicity versus usefulness must be observed for any statistical method when one is making decisions based on a sample. 9 P age

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.