Lecture 7: No-parametric Compariso of Locatio GENOME 560 Doug Fowler, GS (dfowler@uw.edu) 1
Review How ca we set a cofidece iterval o a proportio? 2
What do we mea by oparametric? 3
Types of Data A Review Nomial data are differetiated based o ame oly 4
Types of Data A Review Ordial data are also differetiated by ame but they ca be rak ordered (1 st, 2 d, etc) 5
Types of Data A Review Iterval data allow for degree of differece to be calculated, but ot a ratio 6
Types of Data A Review Ratio data are a ratio betwee the magitude of a quatity ad a uit magitude of the same type 7
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified 8
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate 9
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale 10
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale Assumptios for parametric test ot met or caot be verified Shape of the distributio from which sample is draw is ukow Whe sample sizes are small 11
Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale Assumptios for parametric test ot met or caot be verified Shape of the distributio from which sample is draw is ukow Whe sample sizes are small There are outliers/extreme values rederig the mea uhelpful 12
Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 13
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 14
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 H 0 : M = M 0 H A : M 6= M 0 15
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 The sig test statistic, S, is the umber of plus sigs amog the differeces 16
Sig Test 0 50 100 150 200 Half smaller (miuses) Half bigger Used to test if a sample (plusses) media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 Super simple idea: half should be bigger ad half smaller! 17
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 p = P (X i >M 0 )=0.5 Formalizig this idea the chace of a observatio beig bigger tha the media is 50% 18
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs H 0 : M = M 0 H A : M 6= M 0 How should the test statistic be distributed (i.e. how ca we describe gettig r plus sigs from observatios)? 19
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Assumptios: Observatios are assumed to be idepedet Each differece comes from the same cotiuous populatio Data are ordered (at least ordial) such that the comparisos greater tha, less tha ad equal to have meaig 20
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. We wat to kow if the GFP ad RFP sigal itesity is the same i each cell 21
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. Because the data are paired, we ca do a oe sample test o the differece 22
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. To get S, we cout the umber of plus sigs 23
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. 3X r=0 11 r 0.5 r 0.5 (11 r) + X11 r=8 11 r 0.5 r 0.5 (11 r) The we calculate the probability of gettig at most 3 or at least 11-3 = 8 plus sigs amogst 11 observatios (two tailed) 24
Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. p =0.227 So, we coclude that the media of the differeces is 0, ad that the GFP/RFP sigals are equal 25
Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Zeroes are usiged ad removed (hece reducig ) Sesitive to too may zeros - if your sample cotais may zeroes, icrease measuremet precisio 26
Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 27
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful 28
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? 29
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples 30
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples 31
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig 32
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig Next, rak each observatio by the magitude of the differece ad compute W (R i = rak of i th observatio) 33
Wilcoxo Siged-Rak Test If data are from symmetric distributio, W = 0 Similar to sig test, but takes advatage of magitudes if calculated with i additio to sigs ad is therefore more true media powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig Next, rak each observatio by the magitude of the differece ad compute W 34
Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples Assumptios: Observatios are idepedet Observatios are draw from a cotiuous distributio The observatios ca ordered The distributio must be symmetric 35
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 36
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 37
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 38
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 39
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 40
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 Ties are awarded the midrak value 41
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 42
Distributio of W A Simple Example Cosider the case where =3 43
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative 44
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely 45
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 46
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 Sum of raks could rage from (+1)/2 to (+1)/2 47
Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 Sum of raks could rage from (+1)/2 to (+1)/2 48
Distributio of W A Simple Example The umber of possibilities grows with 49
Distributio of W The umber of possibilities grows with, ad evetually becomes ormally distributed So, table is used for <10, ormal approximatio for >10 50
Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 51
Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 52
Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? 53
Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? 54
Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? A 95% cofidece iterval for M would correspod to the rage of values i a two-sided hypothesis test M=M 0 that would lead to acceptace of H 0 with α = 0.05 55
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude 56
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: X k is the k th from the smallest observatio; X -k+1 is the k th from the largest 57
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: Recall that this iterval will cotai the true media 95% of the time, o log-ru samplig 58
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 59
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 60
Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 61
Use Sig Test Formalism to Fid k Recall 62
Use Sig Test Formalism to Fid k Recall For a 95% CI, k correspods to the miimum value of S (i.e. umber of plus sigs) we could observe with P <0.95 63
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 64
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 65
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 Fid the media value 66
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 Use the cumulative biomial distributio to fid miimum umber of plus sigs which would you pick? 67
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 So, k = 2 is our (coservative) pick for the 95% CI 68
Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12-3.2 to -0.2 give a ~95% cofidece iterval for the media So, k = 2 is our (coservative) pick for the 95% CI 69
R Goals Scripts i R Why? How? Do t worry; we will cover oparametric tests i R ext time 70