Lecture 7: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1
Review How ca we set a cofidece iterval o a proportio? 2
Review How ca we set a cofidece iterval o a proportio? Assume the samplig distributio of p is ormal ad fid the appropriate z-value What is a cotigecy table? 3
Cotigecy Tables 4
Review How ca we set a cofidece iterval o a proportio? Assume the samplig distributio of p is ormal ad fid the appropriate z-value If we wat to test the equality of proportios whe multiple samples/classificatios are ivolved, what ca we do? 5
Review How ca we set a cofidece iterval o a proportio? Assume the samplig distributio of p is ormal ad fid the appropriate z-value If we wat to test the equality of proportios whe multiple samples/classificatios are ivolved, what ca we do? Chi-squared test (each class >5) Fisher s exact test (small class ) 6
What do we mea by oparametric? 7
What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio 8
What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio Rage Media 9
What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio Rage Media Distributio-free methods of iferece, which do ot rely o assumptios that data are draw from a give probability distributio 10
What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio Rage Media Distributio-free methods of iferece, which do ot rely o assumptios that data are draw from a give probability distributio Assumig the data are ormal/poisso/etc 11
What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio Rage Media Distributio-free methods of iferece, which do ot rely o assumptios that data are draw from a give probability distributio Some methods we have already talked about are distributio free: CLT Chi-squared Fisher s exact 12
Types of Data A Review Nomial data are differetiated based o ame oly 13
Types of Data A Review Ordial data are also differetiated by ame but they ca be rak ordered (1 st, 2 d, etc) 14
Types of Data A Review Iterval data allow for degree of differece to be calculated, but ot a ratio 15
Types of Data A Review Ratio data are a ratio betwee the magitude of a quatity ad a uit magitude of the same type 16
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified 17
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate 18
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale 19
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale Assumptios for parametric test ot met or caot be verified Shape of the distributio from which sample is draw is ukow Whe sample sizes are small 20
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale Assumptios for parametric test ot met or caot be verified Shape of the distributio from which sample is draw is ukow Whe sample sizes are small There are outliers/extreme values rederig the mea uhelpful 21
Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 22
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 23
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 H 0 : M = M 0 H A : M 6= M 0 24
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 The sig test statistic, S, is the umber of plus sigs amog the differeces 25
Sig Test 0 50 100 150 200 Half smaller (miuses) Half bigger Used to test if a sample (plusses) media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 Super simple idea: half should be bigger ad half smaller! 26
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 p = P (X i >M 0 )=0.5 Formalizig this idea the chace of a observatio beig bigger tha the media is 50% 27
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs H 0 : M = M 0 H A : M 6= M 0 How should the test statistic be distributed (i.e. how ca we describe gettig r plus sigs from observatios)? 28
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs H 0 : M = M 0 H A : M 6= M 0 Biomially 29
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Assumptios: Observatios are assumed to be idepedet Each differece comes from the same cotiuous populatio Data are ordered (at least ordial) such that the comparisos greater tha, less tha ad equal to have meaig 30
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. We wat to kow if the GFP ad RFP sigal itesity is the same i each cell 31
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. Because the data are paired, we ca do a oe sample test o the differece 32
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. What are our hypotheses? 33
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. H 0 : M differece =0 H A : M differece 6=0 What are our hypotheses? 34
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. To get S, we cout the umber of plus sigs 35
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. 3X r=0 11 r 0.5 r 0.5 (11 r) + X11 r=8 11 r 0.5 r 0.5 (11 r) The we calculate the probability of gettig at most 3 or at least 11-3 = 8 plus sigs amogst 11 observatios (two tailed) 36
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. p =0.227 So, we coclude that the media of the differeces is 0, ad that the GFP/RFP sigals are equal 37
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Zeroes are usiged ad removed (hece reducig ) Sesitive to too may zeros - if your sample cotais may zeroes, icrease measuremet precisio 38
Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 39
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful 40
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? 41
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? Raks 42
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? Raks Data Data, sorted Raks 5 1 1 3 3 2 8 5 3 10 8 4 1 10 5 43
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? Raks Elimiates effects of extreme outliers BUT we lose iformatio Data Data, sorted Raks 5 1 1 3 3 2 8 5 3 10 8 4 1 10 5 44
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples 45
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples 46
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig 47
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig Next, rak each observatio by the magitude of the differece ad compute W (R i = rak of i th observatio) 48
Wilcoxo Siged-Rak Test If data are from symmetric distributio, W = 0 Similar to sig test, but takes advatage of magitudes if calculated with i additio to sigs ad is therefore more true media powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig Next, rak each observatio by the magitude of the differece ad compute W 49
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples Assumptios: Observatios are idepedet Observatios are draw from a cotiuous distributio The observatios ca ordered The distributio must be symmetric 50
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 51
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 52
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 53
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 54
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 55
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 Ties are awarded the midrak value 56
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 57
Distributio of W A Simple Example Cosider the case where =3 58
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative 59
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely 60
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 61
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 Sum of raks could rage from (+1)/2 to (+1)/2 62
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 Sum of raks could rage from (+1)/2 to (+1)/2 63
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 What is the miimum p-value for =3? 64
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 What is the miimum p-value for =3? 0.13 65
Distributio of W A Simple Example The umber of possibilities grows with 66
Distributio of W The umber of possibilities grows with, ad evetually becomes ormally distributed So, table is used for <10, ormal approximatio for >10 67
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 68
Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 69
Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? 70
Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? 71
Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? A 95% cofidece iterval for M would correspod to the rage of values i a two-sided hypothesis test M=M 0 that would lead to acceptace of H 0 with α = 0.05 72
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude 73
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: X k is the k th from the smallest observatio; X -k+1 is the k th from the largest 74
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: Recall that this iterval will cotai the true media 95% of the time 75
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 76
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 77
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 78
Use Sig Test Formalism to Fid k Recall 79
Use Sig Test Formalism to Fid k Recall For a 95% CI, k correspods to the miimum value of S (i.e. umber of plus sigs) we could observe with P <0.95 80
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 81
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 82
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 Fid the media value 83
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 Use the cumulative biomial distributio to fid miimum umber of plus sigs which would you pick? 84
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 So, k = 2 is our (coservative) pick for the 95% CI 85
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12-3.2 to -0.2 give a ~95% cofidece iterval for the media So, k = 2 is our (coservative) pick for the 95% CI 86
R Goals Scripts i R Why? How? Do t worry; we will cover oparametric tests i R ext time 87