Doubt is the beginning, not the end, of wisdom. ANONYMOUS The hypergeometric distribution - theoretical basic for the deviation between replicates in one germination test? Winfried Jackisch 7 th ISTA Seminar on Statistics 25 e-mail: Win.Jackisch@t-online.de 1
Content 1. Introduction and material 2. Analysis of the number of identical results between the replicates and the distribution on germination results in replicates and subsamples 3. Comparison between the observed and expected variances 4. Observed ranges in comparison to the maximum tolerated ranges 5. The hypergeometric distribution - the solution of the problem 2
Frequency distribution of the germination results (G% 4 ) in the data pool 14 12 No. of events 1 8 6 4 2 1 95 9 85 8 75 7 Germination (%) Note: Survey of 9.525 germination results of commercial seed lots 3
Analysis of the Data Pool Data pool was arranged in groups with identical germination (G 4 = ABCD) and has been divided into sub-groups [(G 1 = A, B, C, D), (G 2 = AB,..., CD), (G 3 = ABC,...)] From the arranged data sets following parameters were computed: (i) arithmetic mean (ii) variance (iii) standard deviation (iv) range between the replicates (v) number of identical results between the replicates Following statistical procedure are used : chi²-good fitness test, F-Test Test level: alpha =.5 A B C D AB AC AD BC BD CD ABC ABD ACD BCD n=4 (ABCD) 4
Number of identical germination results between the replicates in the laboratories Lab No. A=B A=C A=D B=C B=D C=D H 6.886 1.98 1.188 1.124 1.13 1.214 1.163 J 1.48 175 166 151 151 164 16 P 512 61 6 5 54 7 81 V 347 45 29 36 36 44 5 A 39 56 61 68 47 52 49 S 236 4 42 21 27 27 17 Total 9.338 1.475 1.546 1.45 1.445 1.571 1.52 Uniform distribution: Chi² calc. < Chi² tab. (apart from the row with red numbers) 5
Number of identical results between the replicates in different germination groups (G%) G% No. A=B A=C A=D B=C B=D C=D 96 124 218 211 22 211 233 27 94 758 9 91 93 86 95 16 92 593 69 88 66 68 8 6 9 358 54 32 21 35 43 28 88 231 18 19 18 24 21 27 86 145 8 16 2 13 18 14 84 79 1 6 2 6 7 6 Uniform distribution: Chi² calc. < Chi² tab. (apart from the row with red numbers) 6
Distribution on germination results in the replicates with 1 tested seeds from 1.45 standardised tests (G 4 = 95%) 3 25 No. of events 2 15 1 5 A_obs. B_obs. C_obs. D_obs. 1 99 98 97 96 95 94 93 92 91 9 89 88 87 Germination (%) Chi² calc. < Chi² tab. Mean to each replicate 7
Comparison between the observed and expected distribution (binomial) in the replicates with 1 seeds from 1.45 tests (G 4 = 95%) 3 No. of events 25 2 15 1 5 A_observed expected (binomial) Chi² calc. > Chi² tab. 1 98 96 94 92 9 88 Germination (%) 8
Comparison between the observed and expected distribution (binomial) in the sub-groups with 2 seeds from 1.45 tests (G 4 = 95%) 4 No. of events 35 3 25 2 15 1 5 AB_observed expected (binomial) Chi² calc. > Chi² tab. 1 98 96 94 92 9 Germination (%) 9
Comparison between the observed variance in the germination groups and the expected variance by the binomial model Variance (s²) 18 16 14 12 1 8 6 4 2 1 98 96 94 92 9 88 86 84 82 8 Germination [n = 4] (%) 1_observed 1_expected (binomial) 2_observed 2_expected (binomial) 4_binomial 1
Comparison between the observed and expected variances in seed lots with 9% germination (G 4 ) 1 observed obs.-mean expected (binom.) 8 Variances 6 4 2 A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD Replicates and sub-groups 11
Variances in seed lots with G 4 = 93% in different labs 12 1 8 6 4 2 A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD variance (s²) A-25 H-424 J-132 P-56 S-22 V-35 expect. Replicates and sub-groups 12
Survey on the variances in sub-groups for identical certified seed lots (G 4 ) 12 1 variance (s²) 8 6 4 2 A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD 96% 93% 9.% 87% 84% Replicates and sub-groups 13
Proportion of the observed ranges in the germination groups which are greater than the tolerated maximum range G 4 (%) R max. * No. results R obs. > R max. * Absolute relative 96 8 1.24 1,1 95 9 1.45 4,4 93-94 1 1.452 4,3 91-92 11 1.68 7,7 89-9 12 654 6,9 87-88 13 396 6 1,6 84-86 14 338 4 1,2 81-83 15 175 2 1,1 78-8 16 92 2 2,1 Total: 6.46 36,56 R max. * = maximum tolerated range by ISTA 14
Observed variances in simulated germination results (G 4 = 78-8%) described by the hypergeometric model Variances 18 15 12 9 6 expected (binomial) observed (without outliers) observed (with outliers) 3 expected (hypergeometric) A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD Replicates and sub-groups Source: Jackisch and Vogel (24). 538 germination results contain 17 outliers (= 3,1%) 15
Formulae of the binomial and hypergeometric distribution Binomial Distribution Hypergeometric Distribution M N - M n x n - x B (x) = p x (1 - p) n - x H(x) = H(n - x) = x N n n number of tested seeds N total number of tested seeds p proportion of germinated M Number of germinated seeds in N seeds in the seed lot n number of tested seeds in the sub-group x number of germinated seeds in x number of germinated seeds in the sample n the sub-group n s² = n p (1 - p) s² = n (M / N) ( 1 - M / N) (1 - n / N) 16
Comparison between the expected STDEV from the B- and H-model and the observed STDEV from two secondary analyses from commercial seed lots STDEV (s) between 4 replicates 5, 4, 3, 2, 1, binomial hypergeometric STDEV (s) between the 4 replicates 5, 4, 3, 2, 1, hypergeometric binomial A B, 99 94 89 84 79 74 G 4 (%), 1 9 8 7 6 5 4 3 2 1 Germination (G 4 %) Circle: 199-1 lab (4.36 values) Triangle: 23-6 labs (9.525 values) 17