Lecture 8: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1
Review What do we mea by oparametric? What is a desirable locatio statistic for ordial data? What are NP equivalets of a oe-sample t-test? 2
Review What do we mea by oparametric? Descriptive stats or iferece methods that do t deped (as much) o the distributio of the populatio beig sampled What is a desirable locatio statistic for ordial data? What are NP equivalets of a oe-sample t-test? 3
Review What do we mea by oparametric? Descriptive stats or iferece methods that do t deped (as much) o the distributio of the populatio beig sampled What is a desirable locatio statistic for ordial data? Media why? What are NP equivalets of a oe-sample t-test? 4
Review What do we mea by oparametric? Descriptive stats or iferece methods that do t deped (as much) o the distributio of the populatio beig sampled What is a desirable locatio statistic for ordial data? Media why? What are NP equivalets of a oe-sample t-test? Sig test, Wilcoxo siged rak test summary? 5
Goals Comparig the medias of two samples usig the Wilcoxo Rak Sum test Comparig the medias of may mutually idepedet samples usig the Kruskal-Wallis test 6
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes 7
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test 8
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test 9
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test Pool N = x + y observatios 10
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test Pool N = x + y observatios Arrage ito a ordered array, preservig labels 11
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test Pool N = x + y observatios Arrage ito a ordered array, preservig labels Assig raks to each elemet of the array from 1 N 12
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test Pool N = x + y observatios Arrage ito a ordered array, preservig labels Assig raks to each elemet of the array from 1 N The test statistic T x is the sum of the raks of X 13
Wilcoxo Rak Sum Test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test Pool N = x + y observatios Arrage ito a ordered array, preservig labels Assig raks to each elemet of the array from 1 N The test statistic T x is the sum of the raks of X Reject H 0 if T x is very large or very small compared to possible values of T x for = N 14
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? 15
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? 16
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? 17
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? Observatio Sample Rak 2.5 X 3 X 6.2 X 9.1 X 14.3 X 14.7 X 14.1 Y 15.6 Y 16.7 Y 18
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? Observatio Sample Rak 2.5 X 1 3 X 2 6.2 X 3 9.1 X 4 14.1 Y 5 14.3 X 6 14.7 X 7 15.6 Y 8 16.7 Y 9 19
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? Observatio Sample Rak 2.5 X 1 3 X 2 6.2 X 3 9.1 X 4 14.1 Y 5 14.3 X 6 14.7 X 7 15.6 Y 8 16.7 Y 9 20
Distributio of T x Whe N is Small Cosider a case where x = 2 ad y = 3 We kow raks must be 1, 2, 3, 4, 5 Agai, the issue is how to assig these raks amogst the samples X ad Y 21
Distributio of T x Whe N is Small Cosider a case where x = 2 ad y = 3 We kow raks must be 1, 2, 3, 4, 5 Agai, the issue is how to assig these raks amogst the samples X ad Y There are ways of assigig five raks to two samples Each way is equally likely uder the ull hypothesis so each has a probability of 10% 22
Distributio of T x Whe N is Small X Raks Y Raks Value of T x Probability 1, 2 3, 4, 5 3 0.10 1, 3 2, 4, 5 4 0.10 1, 4 2, 3, 5 5 0.10 2, 3 1, 4, 5 5 0.10 2, 4 1, 3, 5 6 0.10 1, 5 2, 3, 4 6 0.10 2, 5 1, 3, 4 7 0.10 3, 4 1, 2, 5 7 0.10 3, 5 1, 2, 4 8 0.10 4, 5 1, 2, 3 9 0.10 probability 0.00 0.05 0.10 0.15 0.20 3 5 7 9 Tx 23
Distributio of T x Whe N is Small X Raks Y Raks Value of T x Probability 1, 2 3, 4, 5 3 0.10 1, 3 2, 4, 5 4 0.10 1, 4 2, 3, 5 5 0.10 2, 3 1, 4, 5 5 0.10 2, 4 1, 3, 5 6 0.10 1, 5 2, 3, 4 6 0.10 2, 5 1, 3, 4 7 0.10 3, 4 1, 2, 5 7 0.10 3, 5 1, 2, 4 8 0.10 4, 5 1, 2, 3 9 0.10 probability 0.00 0.05 0.10 0.15 0.20 3 5 7 9 Tx 24
Distributio of T x Whe N is Large 25
Distributio of T x Whe N is Large Will be ormally distributed how do we calculate a z value? 26
Distributio of T x Whe N is Large Will be ormally distributed how do we calculate a z value? Subtract the value of T x we got from the mea of the samplig distributio of T x ad divide by the stadard deviatio of the samplig distributio of T x 27
Distributio of T x Whe N is Large Will be ormally distributed how do we calculate a z value? 28
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? Observatio Sample Rak 2.5 X 1 3 X 2 6.2 X 3 9.1 X 4 14.1 Y 5 14.3 X 6 14.7 X 7 15.6 Y 8 16.7 Y 9 29
Wilcoxo Rak Sum Test - Example Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? Observatio Sample Rak 2.5 X 1 3 X 2 6.2 X 3 9.1 X 4 14.1 Y 5 14.3 X 6 14.7 X 7 15.6 Y 8 16.7 Y 9 Accept H 0 30
Distributio of the Populatios How would oequality of the shape of the populatios from which X ad Y are draw affect the test? 31
Distributio of the Populatios How would oequality of the shape of the populatios from which X ad Y are draw affect the test? Here, we would erroeously ifer that the populatios had differet medias media 32
Distributio of the Populatios How would oequality of the shape of the populatios from which X ad Y are draw affect the test? We must make the assumptio that our distributios have the same shape media 33
How Do We Test Whether The Shapes Are Equal? 34
How Do We Test Whether The Shapes Are Equal? Simplest way is to use boxplots/histograms to get a sese of whether the distributios appear to be similar 35
How Do We Test Whether The Shapes Are Equal? Simplest way is to use boxplots/histograms to get a sese of whether the distributios appear to be similar You ca use formal tests for dispersio/scale parameters (e.g. Asari-Bradley) though you have to equalize the locatio of the two distributios first! 36
Wilcoxo Rak Sum Test Geeralizatio of the Wilcoxo Siged Rak test Used to test whether two samples are likely to be draw from the same distributio or differet oes Idetical to Ma-Whitey U test Ofte called Ma-Whitey-Wilcoxo test (otice alphabetical order of the dead people) Assumptios: Observatios are idepedet Observatios are draw from a cotiuous distributios The values draw are ordered The shapes of the two distributios are idetical 37
Frak Wilcoxo Wilcoxo lived from 1892 to 1965. He was a polymath, workig as a oilma ad a tree surgeo before traiig as a physical chemist, workig i plat research ad the i process cotrol i idustry. I a sigle paper i 1945 he published both tests that bear his ame. 38
Goals Comparig the medias of two samples usig the Wilcoxo Rak Sum test Comparig the medias of may mutually idepedet samples usig the Kruskal-Wallis test 39
Kruskal-Wallis Test Geeralizatio of the Wilcoxo rak sum test to 3 or more idepedet radom samples Used to test whether the medias of the samples are equal Noparametric versio of the oe-way ANOVA 40
Kruskal-Wallis Test Geeralizatio of the Wilcoxo rak sum test to 3 or more idepedet radom samples Used to test whether the medias of the samples are equal Noparametric versio of the oe-way ANOVA 41
Kruskal-Wallis Test Geeralizatio of the Wilcoxo rak sum test to 3 or more idepedet radom samples Used to test whether the medias of the samples are equal Noparametric versio of the oe-way ANOVA Pool all observatios 42
Kruskal-Wallis Test Geeralizatio of the Wilcoxo rak sum test to 3 or more idepedet radom samples Used to test whether the medias of the samples are equal Noparametric versio of the oe-way ANOVA Pool all observatios Rak the pooled samples 43
Kruskal-Wallis Test Geeralizatio of the Wilcoxo rak sum test to 3 or more idepedet radom samples Used to test whether the medias of the samples are equal Noparametric versio of the oe-way ANOVA Pool all observatios Rak the pooled samples Sum the raks for each sample to get idividual sample rak sums 44
Kruskal-Wallis Test Uder the ull hypothesis, what should be true about the relatioship betwee ay two rak sums R i, R j? 45
Kruskal-Wallis Test Uder the ull hypothesis, what should be true about the relatioship betwee ay two rak sums R i, R j? Ay R i is a radom sample of raks Therefore, the meas of ay two rak sums should be equal 46
Kruskal-Wallis Test Uder the ull hypothesis, what should be true about the relatioship betwee ay two rak sums R i, R j? Ay R i is a radom sample of raks Therefore, the meas of ay two rak sums should be equal I fact, 47
Kruskal-Wallis Test The sum of all the sample rak sums is 48
Kruskal-Wallis Test The sum of all the sample rak sums is Where N is the total umber of pooled observatios Give that, what is the expected value of ay oe average rak sum uder the ull hypothesis (that they are all equal)? 49
Kruskal-Wallis Test The sum of all the sample rak sums is Where N is the total umber of pooled observatios Give that, what is the expected value of ay oe average rak sum uder the ull hypothesis (that they are all equal)? 50
Kruskal-Wallis Test Statistic Give this, how could we costruct a test statistic to see if each sample media deviates from the expected value? 51
Kruskal-Wallis Test Statistic Give this, how could we costruct a test statistic to see if each sample media deviates from the expected value? 52
Kruskal-Wallis Test Statistic Give this, how could we costruct a test statistic to see if each sample media deviates from the expected value? This is the sum of squares (or the sum of the squared differeces betwee each score ad the expected value) Variace/stadard deviatio Least squares regressio ANOVA 53
Kruskal-Wallis Test Statistic Q, the Kruskal-Wallis test statistic is the weighted sum of squares of deviatios of the actual average rak sums from the expected average rak sum 54
Kruskal-Wallis Test Statistic Q, the Kruskal-Wallis test statistic is the weighted sum of squares of deviatios of the actual average rak sums from the expected average rak sum i th average rak sum value Expected average rak sum value 55
Kruskal-Wallis Test Statistic Q, the Kruskal-Wallis test statistic is the weighted sum of squares of deviatios of the actual average rak sums from the expected average rak sum Q = 12 P k i=1 i( R i N+1 2 )2 N(N + 1) 56
Kruskal-Wallis Test Statistic Distributio What distributio should the KW test statistic follow (hit, it is ot a ormal distributio)? 57
Kruskal-Wallis Test Statistic Distributio What distributio should the KW test statistic follow (hit, it is ot a ormal distributio)? Aother hit: the test statistic is a sum of squares of somethig that IS ormally distributed 58
The Chi-Square Distributio The Chi-square distributio is the distributio of the sum of squared idepedet stadard ormal RVs. df 2 2 χdf = Z ; where Z ~ Ν 0, 1) i= 1 The expected value ad variace of the chi-square E(x) = df Var(x) = 2 * (df) 59
Kruskal-Wallis Test Example Let s say we survey icomig Geome Scieces classes for the distace each studet traveled to get here Year 1 Year 2 Year 3 Year 4 164 1204 131 353 119 1107 20 47 66 414 444 1333 52 342 444 83 367 422 426 305 163 195 0 181 138 115 706 247 77 542 516 266 15 144 66 60
Kruskal-Wallis Test Example Let s say we survey icomig Geome Scieces classes for the distace each studet traveled to get here Year 1 Y1 Raks Year 2 Y2 Raks Year 3 Y3 Raks Year 4 Y4 Raks 164 16 1204 34 131 12 353 23 119 11 1107 33 20 3 47 4 66 6.5 414 25 444 28.5 1333 35 52 5 342 22 444 28.5 83 9 367 24 422 26 426 27 305 21 163 15 195 18 0 1 181 17 138 13 115 10 706 32 247 19 77 8 542 31 516 30 266 20 15 2 144 14 66 6.5 Rak the observatios collectively 61
Kruskal-Wallis Test Example Let s say we survey icomig Geome Scieces classes for the distace each studet traveled to get here R Year 1 Y1 Raks Year 2 Y2 Raks Year 3 Y3 Raks Year 4 Y4 Raks 164 16 1204 34 131 12 353 23 119 11 1107 33 20 3 47 4 66 6.5 414 25 444 28.5 1333 35 52 5 342 22 444 28.5 83 9 367 24 422 26 426 27 305 21 163 15 195 18 0 1 181 17 138 13 115 10 706 32 247 19 77 8 542 31 516 30 266 20 15 2 144 14 66 6.5 sum 132.5 168 171.5 158 mea 13.25 24 17.15 19.75 Calculate the rak sum, R, for each class 62
Kruskal-Wallis Test Example Let s say we survey icomig Geome Scieces classes for the distace each studet traveled to get here R R Year 1 Y1 Raks Year 2 Y2 Raks Year 3 Y3 Raks Year 4 Y4 Raks 164 16 1204 34 131 12 353 23 119 11 1107 33 20 3 47 4 66 6.5 414 25 444 28.5 1333 35 52 5 342 22 444 28.5 83 9 367 24 422 26 426 27 305 21 163 15 195 18 0 1 181 17 138 13 115 10 706 32 247 19 77 8 542 31 516 30 266 20 15 2 144 14 66 6.5 sum 132.5 168 171.5 158 mea 13.25 24 17.15 19.75 Calculate the rak sum average,, for each class R 63
Kruskal-Wallis Test Example Let s say we survey icomig Geome Scieces classes for the distace each studet traveled to get here Year 1 Y1 Raks Year 2 Y2 Raks Year 3 Y3 Raks Year 4 Y4 Raks 164 16 1204 34 131 12 353 23 119 11 1107 33 20 3 47 4 66 6.5 414 25 444 28.5 1333 35 52 5 342 22 444 28.5 83 9 367 24 422 26 426 27 305 21 163 15 195 18 0 1 181 17 138 13 115 10 706 32 247 19 77 8 542 31 516 30 266 20 15 2 144 14 66 6.5 sum 132.5 168 171.5 158 mea 13.25 24 17.15 19.75 Calculate Q Q = 12 P k i=1 i( R i N+1 2 )2 N(N + 1) 64
Kruskal-Wallis Test Example Let s say we survey icomig Geome Scieces classes for the distace each studet traveled to get here Year 1 Y1 Raks Year 2 Y2 Raks Year 3 Y3 Raks Year 4 Y4 Raks 164 16 1204 34 131 12 353 23 119 11 1107 33 20 3 47 4 66 6.5 414 25 444 28.5 1333 35 52 5 342 22 444 28.5 83 9 367 24 422 26 426 27 305 21 163 15 195 18 0 1 181 17 138 13 115 10 706 32 247 19 77 8 542 31 516 30 266 20 15 2 144 14 66 6.5 sum 132.5 168 171.5 158 mea 13.25 24 17.15 19.75 Q = 12 P k i=1 i( R i N+1 2 )2 N(N + 1) p =2 10 24 65
Kruskal-Wallis Test Outcome Give the way the test statistic/hypotheses are costructed, what does a rejectio of H 0 mea? 66
Kruskal-Wallis Test Outcome Give the way the test statistic/hypotheses are costructed, what does a rejectio of H 0 mea? That the medias are ot all equal (i.e. does t tell you which are uequal) 67
Kruskal-Wallis Test Outcome Give the way the test statistic/hypotheses are costructed, what does a rejectio of H 0 mea? That the medias are ot all equal (i.e. does t tell you which are uequal) Havig rejected the ull, you might aturally wat to kow which medias are differet 68
Kruskal-Wallis Test Outcome Give the way the test statistic/hypotheses are costructed, what does a rejectio of H 0 mea? That the medias are ot all equal (i.e. does t tell you which are uequal) Havig rejected the ull, you might aturally wat to kow which medias are differet Pairwise Wilcoxo rak sum tests are a way to do this, but you ll have to correct for multiple tests 69
Kruskal-Wallis Test Geeralizatio of the Wilcoxo rak sum test to 3 or more idepedet radom samples Used to test whether the medias of the samples are equal Noparametric versio of the oe-way ANOVA Assumptios: k mutually idepedet radom samples measured o at least a ordial scale draw from a cotiuous distributio shapes of the distributios are idetical 70
Noparametric Locatio Tests Ca be used to perform oe or two sample tests with fewer assumptios about the distributio from which the sample(s) are draw Usage of sig ad rak (rather tha iterval, as with parametric tests) eable this ad cofer other beefits More robust (immue to outliers) Ca be used o ordial data NP tests still have assumptios, ad still must be used with care (e.g. zeroes for sig test, ties, similarity of distributios for rak-sum test) 71
AND THERE IS NO FREE LUNCH Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? Observatio Sample Rak 2.5 X 1 3 X 2 6.2 X 3 9.1 X 4 14.1 Y 5 14.3 X 6 14.7 X 7 15.6 Y 8 16.7 Y 9 72
AND THERE IS NO FREE LUNCH Let s say we have measured a trascript level i 6 preoperative patiets (X) ad 3 post-operative patiets (Y). Does the surgery chage trascript levels? If we assume ormality ad ideticality of variace, the a two sample t-test gives: 73
AND THERE IS NO FREE LUNCH Geerally speakig, oparametric tests trade fewer assumptios for less power 74
AND THERE IS NO FREE LUNCH Geerally speakig, oparametric tests trade fewer assumptios for less power Differet oparametric tests perform better or worse i this regard (efficiecy) 75
AND THERE IS NO FREE LUNCH Geerally speakig, oparametric tests trade fewer assumptios for less power Differet oparametric tests perform better or worse i this regard (efficiecy) All will do better tha their parametric couterparts whe assumptios are violated 76
AND THERE IS NO FREE LUNCH Geerally speakig, oparametric tests trade fewer assumptios for less power Differet oparametric tests perform better or worse i this regard (efficiecy) All will do better tha their parametric couterparts whe assumptios are violated The Ma-Whitey-Wilcoxo test is particularly good, givig up little power eve for ormally distributed data 77
R Goals Executig oparametric tests i R Playig aroud with differet distributio shapes ad test assumptios Examiig effect size vs. test outcome 78
Readig/Resources http://www.statsoft.com/textbook/noparametric- Statistics/butto/2 http://sci2s.ugr.es/keel/pdf/algorithm/articulo/wilcoxo1 945.pdf http://www.mayo.edu/mayo-edu-docs/ceter-fortraslatioal-sciece-activities-documets/berd-5-6.pdf Noparametric statistics: a itroductio, Jea Gibbos (available olie through UW libraries at http://goo.gl/nerixx) 79