Adv. App. Stat. Presentation On a paradoxical property of the Kolmogorov Smirnov two-sample test

Size: px

Start display at page:

Download "Adv. App. Stat. Presentation On a paradoxical property of the Kolmogorov Smirnov two-sample test"

Clifton Merritt
5 years ago
Views:

1 Adv. App. Stat. Presentation On a paradoxical property of the Kolmogorov Smirnov two-sample test Stefan Hasselgren Niels Bohr Institute March 9, 2017 Slide 1/11

2 Bias of Kolmogorov g.o.f. test Draw a sample X 1,..., X n with unknown d.f. F. Based on these, we want to test the hypothesis where F 0 is a fixed d.f. H 0 : F = F 0, Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 2/11

3 A test is unbiased if the probability of rejecting the null hypothesis Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 3/11

4 A test is unbiased if the probability of rejecting the null hypothesis (a) is greater than (or equal to) the significance level when the alternative is true Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 3/11

5 A test is unbiased if the probability of rejecting the null hypothesis (a) is greater than (or equal to) the significance level when the alternative is true and (b) is less than (or equal to) the significance level when the null hypothesis is true. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 3/11

6 A test is unbiased if the probability of rejecting the null hypothesis (a) is greater than (or equal to) the significance level when the alternative is true and (b) is less than (or equal to) the significance level when the null hypothesis is true. The test is biased for the alternative hypothesis, if (a) is not true while (b) is still true. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 3/11

7 Now consider a test, which has the following properties: Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 4/11

8 Now consider a test, which has the following properties: 1 Reject the null hypothesis if d(g n, F 0 ) > δ α. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 4/11

9 Now consider a test, which has the following properties: 1 Reject the null hypothesis if d(g n, F 0 ) > δ α. G n is the sample d.f. of the sample X 1,..., X n, Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 4/11

10 Now consider a test, which has the following properties: 1 Reject the null hypothesis if d(g n, F 0 ) > δ α. G n is the sample d.f. of the sample X 1,..., X n, d(g n, F 0 ) is the distance in the space of d.f. s. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 4/11

11 Now consider a test, which has the following properties: 1 Reject the null hypothesis if d(g n, F 0 ) > δ α. G n is the sample d.f. of the sample X 1,..., X n, d(g n, F 0 ) is the distance in the space of d.f. s. δ α is a distance associated with the significance level α. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 4/11

12 Now consider a test, which has the following properties: 1 Reject the null hypothesis if d(g n, F 0 ) > δ α. G n is the sample d.f. of the sample X 1,..., X n, d(g n, F 0 ) is the distance in the space of d.f. s. δ α is a distance associated with the significance level α. 2 The test is distribution free. Essentially, no assumptions are made about the underlying distribution of the sample. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 4/11

13 Now consider a test, which has the following properties: 1 Reject the null hypothesis if d(g n, F 0 ) > δ α. G n is the sample d.f. of the sample X 1,..., X n, d(g n, F 0 ) is the distance in the space of d.f. s. δ α is a distance associated with the significance level α. 2 The test is distribution free. Essentially, no assumptions are made about the underlying distribution of the sample. 3 Such a test is called a "distance-based test." Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 4/11

14 Now, let s get technical. For some distribution function F ; Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 5/11

15 Now, let s get technical. For some distribution function F ; take all the d.f. s with distance d from F (so all F i where d(f i, F ) = d), Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 5/11

16 Now, let s get technical. For some distribution function F ; take all the d.f. s with distance d from F (so all F i where d(f i, F ) = d), put them in a metric space (somehow), Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 5/11

17 Now, let s get technical. For some distribution function F ; take all the d.f. s with distance d from F (so all F i where d(f i, F ) = d), put them in a metric space (somehow), then make a ball in this space with radius δ > 0 and centre at F. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 5/11

18 Now, let s get technical. For some distribution function F ; take all the d.f. s with distance d from F (so all F i where d(f i, F ) = d), put them in a metric space (somehow), then make a ball in this space with radius δ > 0 and centre at F. Label this ball B(F, δ). Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 5/11

19 Now we take a d.f. F 0, and then suppose that (for some α > 0) there exists a d.f. F a, such that the ball B(F a, δ α ) is strictly inside the ball B(F 0, δ α ), Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 6/11

20 Now we take a d.f. F 0, and then suppose that (for some α > 0) there exists a d.f. F a, such that the ball B(F a, δ α ) is strictly inside the ball B(F 0, δ α ), and there is a non-zero probability of the sample d.f. being in B(F 0, δ α ), but not in B(F a, δ α ), given that F a is the true d.f. Then the distance-based test is biased for the alternative hypothesis F a. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 6/11

21 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

22 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

23 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

24 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa {G n Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

25 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa {G n B(F a, δ α )} Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

26 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa {G n B(F a, δ α )} 1 α. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

27 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa {G n B(F a, δ α )} 1 α. But since the ball B(F a, δ α ) is strictly inside the ball B(F 0, δ α ), then Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

28 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa {G n B(F a, δ α )} 1 α. But since the ball B(F a, δ α ) is strictly inside the ball B(F 0, δ α ), then P Fa {G n B(F 0, δ α )} > 1 α, Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

29 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa {G n B(F a, δ α )} 1 α. But since the ball B(F a, δ α ) is strictly inside the ball B(F 0, δ α ), then P Fa {G n B(F 0, δ α )} > 1 α, which is the same as P Fa {d(g n, F 0 ) > δ α } < α. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

30 Proof The proof is fairly simple. Start by taking a sample X 1,..., X n from F a. This sample has sample d.f. G n. Then P Fa {G n B(F a, δ α )} 1 α. But since the ball B(F a, δ α ) is strictly inside the ball B(F 0, δ α ), then P Fa {G n B(F 0, δ α )} > 1 α, which is the same as P Fa {d(g n, F 0 ) > δ α } < α. But this is strictly against the demand (a) for an unbiased test! And so the distance-based test is biased for the alternative F a. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 7/11

31 Bias of the KS two-sample test for different sample sizes Take two samples X 1,..., X m from F and Y 1,..., Y n from G. The null hypothesis is then H 0 : F = G. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 8/11

32 Bias of the KS two-sample test for different sample sizes Take two samples X 1,..., X m from F and Y 1,..., Y n from G. The null hypothesis is then H 0 : F = G. We can then easily choose or assume F and G to satisfy the conditions above. The article proves, that for equal sample size n = m, the KS two-sample test is unbiased. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 8/11

33 Bias of the KS two-sample test for different sample sizes Take two samples X 1,..., X m from F and Y 1,..., Y n from G. The null hypothesis is then H 0 : F = G. We can then easily choose or assume F and G to satisfy the conditions above. The article proves, that for equal sample size n = m, the KS two-sample test is unbiased. However, for sufficiently large nm, the KS test is biased for the alternative F = F a G, since in the limit for m we obtain the Kolmogorov g.o.f. test. Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 8/11

34 Thank you! Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 9/11

35 Example Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 10/11

36 Example Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 10/11

37 Example Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 11/11

38 Example Stefan Hasselgren (Niels Bohr Institute) Adv. App. Stat. Presentation Slide 11/11

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where