To compare to methods, A and B, one can collect a sample of n pairs of observations. Pair i provides two measurements, Y Ai and Y Bi, one for each method: If we want to compare a reaction of patients to two different stimuli, we may want to measure the reaction of each patient to the two stimuli To compare the performance of two sorting algorithms, they could both be applied to the same data sets We assume that Y Ai = µ A + p i + Z Ai, i = 1,..., n, Y Bi = µ B + p i + Z Bi, i = 1,..., n.
The difference δ µ = µ A µ B is of the main interest. We define Y i = Y Ai Y Bi = δ µ + Z i, Z i = Z Ai Z Bi, i = 1,..., n. Using the pair differences has several advantages: The pair effects, p i -s, cancel; there is no need to model them There is no need to model Z Ai and Z Bi separately. It is enough to assume that Z i are independent and have zero mean By taking the differences, we get one sample, Y 1,..., Y n, instead of two. To estimate δ µ, we can use the sample mean: δ µ = Ȳ δµ is an unbiased estimator of δ µ : E(δ µ ) = E ( 1 n n i=1 (δ µ + Z i ) ) = 1 n n i=1 (δ µ + E(Z i )) = δ µ.
Even if Z Ai and Z Bi are not normal random variables, the difference Z i = Z Ai Z Bi can have an approximate normal distribution. Example 1: We measure the reaction time, Y Ai and Y Bi, to two different stimuli for n = 50 patients. The distributions of Y Ai and Y Bi are very skewed, so normality assumption is not good. The distribution of Y i = Y Ai Y Bi, is closer to normal. reaction time 1 reaction time 2 reaction time difference Frequency 0 5 10 15 20 25 Frequency 0 5 10 15 Frequency 0 5 10 15 0.0 0.5 1.0 1.5 2.0 2.5 3.0 seconds 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 seconds 3 2 1 0 1 seconds
Example 1 (contd): We measure the reaction time, Y Ai and Y Bi, to two different stimuli for n = 50 patients. Assume that for the observed pair differences, y 1,..., y 50, we find ȳ = 1.41 and s y = 4.52. Under the normality assumption, T = Ȳ δ µ S y / 50 t 49, and the 95% CI for δ µ is ( ) s y s y ȳ t 0.975,49, ȳ + t 0.975,49 = ( 0.13, 2.69). 50 50 If H 0 : δ µ = 0 and H a : δ µ 0, then H 0 is rejected at the 5% significance level.
Example 2 (exercise 9.16 from the textbook). We compare the average hit rate (measure of prediction accuracy) for the two classification methods. average hit rate Data set 1 Data set 2 Data set 3 Data set 4 Method 1 (y A ) 0.2490 0.2545 0.2465 0.2530 Method 2 (y B ) 0.2279 0.2121 0.2313 0.2377 Difference (y) 0.0211 0.0424 0.0152 0.0153 For the analysis of the data, we first assume that we have two independent normal samples: Y A1, Y A2, Y A3, Y A4 i.i.d. N(µ A, σ 2 ), Y B1, Y B2, Y B3, Y B4 i.i.d. N(µ B, σ 2 ).
Let H 0 : δ A,B = µ A µ B = 0 and H a : δ A,B > 0. We find δ A,B = ȳ A ȳ B = 0.0235, sa 2 = 1.34 10 5, sb 2 = 1.185 10 4. The pooled sample variance is s 2 P = (4 1)s2 A + (4 1)s2 B (4 1) + (4 1) = 6.6 10 5. To construct the 95% CI for δ A,B, we use the distribution of the test statistic: T = δ A,B δ A,B s P C t ν.
: CLICKER QUESTION 1 To construct the 95% CI for δ A,B, we use the distribution of the test statistic: t = δ A,B δ A,B s P C t ν. Find C and ν if n A = n B = 4 in the exercise. A C = 1/(n A + n B 2) = 0.167 and ν = n A + n B = 8 B C = 1/(n A + n B 2) = 0.167 and ν = n A + n B 2 = 6 C C = 1/n A + 1/n B = 0.5 and ν = n A + n B = 8 D C = 1/n A + 1/n B = 0.5 and ν = n A + n B 2 = 6 E I give up
We have H 0 : δ A,B = 0 and H a : δ A,B > 0. Let T = ȲA Ȳ B δ A,B S P 0.5. P-value of the test: p = Pr (T > t H 0 : δ A,B = 0) = Pr(T > t T t 6 ), the observed value of T under H 0 is t = 4.09 and therefore p = 0.003. We reject H 0 at the 1% significance level. One-sided 95% CI for δ A,B is ( δa,b t 0.95,6 0.5sP, ) = (0.012, ). We can see that zero falls outside this interval.
: CLICKER QUESTION 2 We have H 0 : δ A,B = 0 and H a : δ A,B > 0. We found the P-value p = Pr(T > t = 4.09 T t 6 ) = 0.003. Find the P-value when testing H 0 : δ A,B = 0 against δ A,B 0. A 0.997 B 0.994 C 0.305 D 0.006 E 0.003
This analysis assumes that the two samples, A and B, are independent which is not the case! Both methods applied to the same data sets and we can consider the paired differences, Y 1, Y 2, Y 3, Y 4 where Y i = Y Ai Y Bi, to remove dependence between the two samples. We assume now that Y 1, Y 2, Y 3, Y 4 i.i.d. N(δ µ, σ 2 ). We find δµ = ȳ = 0.0235 and s 2 = 1.66 10 4. Under normality assumption, T = Ȳ δµ S/ n t n 1, where the sample size n = 4. The observed value of T under H 0 is t = ȳ/s/ 4 = 3.644.
The P-value is p = Pr(T > t = 3.644 T t 3 ) = 0.0178. We therefore reject the null at the 2% significance level. The 95% CI for δ µ is ( ȳ t 0.95,3 s ) 2, = (0.0083, ). The power of the test when δ µ = 0.01 is P(0.01) = Pr(T 0 = Ȳ S/ 4 > t 0.95,3 T 0 t NC 3, κ ) = 0.36, where t3,κ NC is the noncentral t-distribution with three degrees of freedom and the noncentrality parameter κ = 0.01 4/σ with κ = 0.01 4/s = 1.55.