Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 7: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1

Review How ca we set a cofidece iterval o a proportio? 2

Review How ca we set a cofidece iterval o a proportio? Assume the samplig distributio of p is ormal ad fid the appropriate z-value What is a cotigecy table? 3

Cotigecy Tables 4

Review How ca we set a cofidece iterval o a proportio? Assume the samplig distributio of p is ormal ad fid the appropriate z-value If we wat to test the equality of proportios whe multiple samples/classificatios are ivolved, what ca we do? 5

What do we mea by oparametric? 7

What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio 8

What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio Rage Media 9

What do we mea by oparametric? No-parametric descriptive statistics, whose iterpretatio does ot deped o the populatio fittig ay parameterized distributio Rage Media Distributio-free methods of iferece, which do ot rely o assumptios that data are draw from a give probability distributio 10

Types of Data A Review Nomial data are differetiated based o ame oly 13

Types of Data A Review Ordial data are also differetiated by ame but they ca be rak ordered (1 st, 2 d, etc) 14

Types of Data A Review Iterval data allow for degree of differece to be calculated, but ot a ratio 15

Types of Data A Review Ratio data are a ratio betwee the magitude of a quatity ad a uit magitude of the same type 16

Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified 17

Whe Would We Use NP Methods? As we have see, parametric methods are appropriate whe the data are iterval or ratio data where assumptios (e.g. ormality) ca be verified I may other cases, NP methods are appropriate Data are couts/frequecies of differet outcomes Data are o a ordial scale Assumptios for parametric test ot met or caot be verified Shape of the distributio from which sample is draw is ukow Whe sample sizes are small 20

Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 22

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 23

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 The sig test statistic, S, is the umber of plus sigs amog the differeces 25

Sig Test 0 50 100 150 200 Half smaller (miuses) Half bigger Used to test if a sample (plusses) media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 Super simple idea: half should be bigger ad half smaller! 26

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs X 1 M 0,...,X M 0 H 0 : M = M 0 H A : M 6= M 0 p = P (X i >M 0 )=0.5 Formalizig this idea the chace of a observatio beig bigger tha the media is 50% 27

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 S = umber of plus sigs H 0 : M = M 0 H A : M 6= M 0 How should the test statistic be distributed (i.e. how ca we describe gettig r plus sigs from observatios)? 28

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Assumptios: Observatios are assumed to be idepedet Each differece comes from the same cotiuous populatio Data are ordered (at least ordial) such that the comparisos greater tha, less tha ad equal to have meaig 30

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. We wat to kow if the GFP ad RFP sigal itesity is the same i each cell 31

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. Because the data are paired, we ca do a oe sample test o the differece 32

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. What are our hypotheses? 33

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. H 0 : M differece =0 H A : M differece 6=0 What are our hypotheses? 34

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. To get S, we cout the umber of plus sigs 35

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. 3X r=0 11 r 0.5 r 0.5 (11 r) + X11 r=8 11 r 0.5 r 0.5 (11 r) The we calculate the probability of gettig at most 3 or at least 11-3 = 8 plus sigs amogst 11 observatios (two tailed) 36

Sig Test - Example Let s say that we trasfected cells with GFP ad RFP. The, we examied them, scorig the GFP ad RFP fluorescece i a cotiuous way. p =0.227 So, we coclude that the media of the differeces is 0, ad that the GFP/RFP sigals are equal 37

Sig Test Used to test if a sample media M is equal to some hypothesized media M 0 The ull hypothesis is that give a radom sample of observatios measured o at least a ordial scale about half are bigger tha M 0 ad half are smaller tha M 0 Zeroes are usiged ad removed (hece reducig ) Sesitive to too may zeros - if your sample cotais may zeroes, icrease measuremet precisio 38

Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 39

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful 40

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? Raks Data Data, sorted Raks 5 1 1 3 3 2 8 5 3 10 8 4 1 10 5 43

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful How ca we take advatage of magitude without resortig to makig assumptios about how the data is distributed? Raks Elimiates effects of extreme outliers BUT we lose iformatio Data Data, sorted Raks 5 1 1 3 3 2 8 5 3 10 8 4 1 10 5 44

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig 47

Wilcoxo Siged-Rak Test If data are from symmetric distributio, W = 0 Similar to sig test, but takes advatage of magitudes if calculated with i additio to sigs ad is therefore more true media powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples First, compute the differece, recordig the absolute value ad the sig Next, rak each observatio by the magitude of the differece ad compute W 49

Wilcoxo Siged-Rak Test Similar to sig test, but takes advatage of magitudes i additio to sigs ad is therefore more powerful Like the sig test, it ca be used to compare a sigle sample to a media or the medias of two paired samples Assumptios: Observatios are idepedet Observatios are draw from a cotiuous distributio The observatios ca ordered The distributio must be symmetric 50

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 51

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. 52

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) 1 2.4 18.1-2 20 0.5-3 7.1 13.4-4 4.6 15.9-5 21.9 1.4 + 6 15.9 4.6-7 24.9 4.4 + 8 21.9 1.4 + 9 7.4 13.1-10 23.3 2.8 + 53

Wilcoxo Siged-Rak Test - Example Let s say we have measured a trascript level i 10 cells. We believe the media level to be 20.5 copies/cell. i Xi Xi Mo sg(xi Mo) R R*sg 2 20 0.5-1 -1 8 21.9 1.4 + 2.5 2.5 5 21.9 1.4 + 2.5 2.5 10 23.3 2.8 + 4 4 7 24.9 4.4 + 5 5 6 15.9 4.6-6 -6 9 7.4 13.1-7 -7 3 7.1 13.4-8 -8 4 4.6 15.9-9 -9 1 2.4 18.1-10 -10 Ties are awarded the midrak value 56

Distributio of W A Simple Example Cosider the case where =3 58

Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative 59

Distributio of W A Simple Example Cosider the case where =3 Raks must be 1, 2 ad 3; the oly questio is which are positive ad which are egative There are 2 3 = 8 differet ways of associatig sigs with raks, each equally likely Positive Raks Negative Raks Value of W Probability Noe 1,2,3-6 0.13 1 2,3-4 0.13 2 1,3-2 0.13 3 1,2 0 0.13 1,2 3 0 0.13 1,3 2 2 0.13 2,3 1 4 0.13 1,2,3 Noe 6 0.13 61

Distributio of W A Simple Example The umber of possibilities grows with 66

Distributio of W The umber of possibilities grows with, ad evetually becomes ormally distributed So, table is used for <10, ormal approximatio for >10 67

Goals Comparig the media of oe sample to a give value sig test Comparig the media of oe sample to a give value Wilcoxo siged rak test Assigig cofidece itervals usig o-parametric methods 69

Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? 70

Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? 71

Cofidece Iterval Estimates for M How ca we assig a cofidece iterval to the media? For ordial data? A 95% cofidece iterval for M would correspod to the rage of values i a two-sided hypothesis test M=M 0 that would lead to acceptace of H 0 with α = 0.05 72

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude 73

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: X k is the k th from the smallest observatio; X -k+1 is the k th from the largest 74

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: Recall that this iterval will cotai the true media 95% of the time 75

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 76

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 77

Cofidece Iterval Estimates for M First, we arrage our data i order of relative magitude We would like to fid k correspodig to the 95% CI such that: 78

Use Sig Test Formalism to Fid k Recall 79

Use Sig Test Formalism to Fid k Recall For a 95% CI, k correspods to the miimum value of S (i.e. umber of plus sigs) we could observe with P <0.95 80

Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 Use the cumulative biomial distributio to fid miimum umber of plus sigs which would you pick? 84

Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12 pbiom(2, 12, prob=0.5) = 0.019 pbiom(3, 12, prob=0.5) = 0.073 So, k = 2 is our (coservative) pick for the 95% CI 85

Example Let s say we measure the differeces temperature of patiets before ad after surgery. We would like to place a 95% CI o the media Tdiff Tdiff, sorted Rak -1.4-5.5 1-0.6-3.2 2-0.2-3.2 3-0.9-2.4 4-3.2-1.4 5-3.2-0.9 6-2.4-0.7 7-0.7-0.6 8-5.5-0.3 9 0.1-0.2 10-0.1-0.1 11-0.3 0.1 12-3.2 to -0.2 give a ~95% cofidece iterval for the media So, k = 2 is our (coservative) pick for the 95% CI 86

R Goals Scripts i R Why? How? Do t worry; we will cover oparametric tests i R ext time 87