Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Lecture 7: No-parametric Compariso of Locatio GENOME 560 Doug Fowler, GS (dfowler@uw.edu) 1

Review How ca we set a cofidece iterval o a proportio? 2

What do we mea by oparametric? 3

Types of Data A Review Nomial data are differetiated based o ame oly 4

Types of Data A Review Ordial data are also differetiated by ame but they ca be rak ordered (1 st, 2 d, etc) 5

Types of Data A Review Iterval data allow for degree of differece to be calculated, but ot a ratio 6

Types of Data A Review Ratio data are a ratio betwee the magitude of a quatity ad a uit magitude of the same type 7

Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified 8

Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale Assumptios for parametric test ot met or caot be verified Shape of the distributio from which sample is draw is ukow Whe sample sizes are small 11

Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 13

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 14

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 The sig test statistic, S, is the umber of plus sigs amog the differeces 16

Sig Test 0 50 100 150 200 Half smaller (miuses) Half bigger Used to test if a sample (plusses) media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 Super simple idea: half should be bigger ad half smaller! 17

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 p = P (X i >M 0 )=0.5 Formalizig this idea the chace of a observatio beig bigger tha the media is 50% 18

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs H 0 : M = M 0 H A : M 6= M 0 How should the test statistic be distributed (i.e. how ca we describe gettig r plus sigs from observatios)? 19

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Assumptios: Observatios are assumed to be idepedet Each differece comes from the same cotiuous populatio Data are ordered (at least ordial) such that the comparisos greater tha, less tha ad equal to have meaig 20

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. We wat to kow if the GFP ad RFP sigal itesity is the same i each cell 21

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. Because the data are paired, we ca do a oe sample test o the differece 22

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. To get S, we cout the umber of plus sigs 23

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. 3X r=0 11 r 0.5 r 0.5 (11 r) + X11 r=8 11 r 0.5 r 0.5 (11 r) The we calculate the probability of gettig at most 3 or at least 11-3 = 8 plus sigs amogst 11 observatios (two tailed) 24

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. p =0.227 So, we coclude that the media of the differeces is 0, ad that the GFP/RFP sigals are equal 25

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Zeroes are usiged ad removed (hece reducig ) Sesitive to too may zeros - if your sample cotais may zeroes, icrease measuremet precisio 26

Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 27

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful 28

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? 29

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig 32

Wilcoxo Siged-Rak Test If data are from symmetric distributio, W = 0 Similar to sig test, but takes advatage of magitudes if calculated with i additio to sigs ad is therefore more true media powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig Next, rak each observatio by the magitude of the differece ad compute W 34

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples Assumptios: Observatios are idepedet Observatios are draw from a cotiuous distributio The observatios ca ordered The distributio must be symmetric 35

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 36

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 37

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 38

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i X i X i M o sg(x i M o ) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 Ties are awarded the midrak value 41

Distributio of W A Simple Example Cosider the case where =3 43

Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative 44

Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 46

Distributio of W A Simple Example The umber of possibilities grows with 49

Distributio of W The umber of possibilities grows with, ad evetually becomes ormally distributed So, table is used for <10, ormal approximatio for >10 50

Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 52

Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? 53

Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? 54

Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? A 95% cofidece iterval for M would correspod to the rage of values i a two-sided hypothesis test M=M 0 that would lead to acceptace of H 0 with α = 0.05 55

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude 56

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: X k is the k th from the smallest observatio; X -k+1 is the k th from the largest 57

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: Recall that this iterval will cotai the true media 95% of the time, o log-ru samplig 58

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 59

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 60

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 61

Use Sig Test Formalism to Fid k Recall 62

Use Sig Test Formalism to Fid k Recall For a 95% CI, k correspods to the miimum value of S (i.e. umber of plus sigs) we could observe with P <0.95 63

Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 Use the cumulative biomial distributio to fid miimum umber of plus sigs which would you pick? 67

Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 So, k = 2 is our (coservative) pick for the 95% CI 68

Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12-3.2 to -0.2 give a ~95% cofidece iterval for the media So, k = 2 is our (coservative) pick for the 95% CI 69

R Goals Scripts i R Why? How? Do t worry; we will cover oparametric tests i R ext time 70