Sequential Analysis & Testing Multiple Hypotheses,

Size: px

Start display at page:

Download "Sequential Analysis & Testing Multiple Hypotheses,"

Roger Ray
5 years ago
Views:

Sequential Analysis & Testing Multiple Hypotheses, CS57300 -

1 Sequential Analysis & Testing Multiple Hypotheses, CS Data Mining Spring 2016 Instructor: Bruno Ribeiro 2016 Bruno Ribeiro

2 This Class: Sequential Analysis Testing Multiple Hypotheses Nonparametric Tests Independence Tests 2

3 Sequential Analysis 3

4 Sequential Probability-ratio Test Inspector needs max of 5 broken bulbs out of 100 to reject lot We can clearly stop after k < 100 bulbs if we observe 5 broken ones But if k = 10 and we have observed 4 broken bulbs. Aren t we confident we can stop? 4

5 Motivation: The New York Times Dilemma Select 50% users to see headline A Titanic Sinks Select 50% users to see headline B Ship Sinks Killing Thousands Hypothesis? H 0 : % page views A = % page views B Assign half the readers to headline A and half to headline B? If A is much better than B then we could reject hypothesis H 0 quickly 5

Consider experiment order Reject hypothesis before end of experiment (see slide 11 for context) Early hypothesis rejection We should not have to wait to declare hypothesis is

6 Consider experiment order Reject hypothesis before end of experiment (see slide 11 for context) Early hypothesis rejection We should not have to wait to declare hypothesis is rejected Problem: Complete experiment: P [Y13 13 X H0 ] = 0.5k (1 k k=11 0.5)n k = < 0.01 First 7 of Paul s predictions: P [Y7 = 7 H0 ] = 0.57 = <

7 Example Application: Jung et al Detect whether a remote host is a port scanner or a benign host Ground truth: based on percentage of local hosts which a remote host has a failed connection Example of parameters: for a scanner, the probability of hitting inactive local host is 0.8 for a benign host, that probability is 0.1 Ack: M. Jordan 7

8 Wald s Sequential Probability Ratio Test Starts with two hypotheses Example: A remote host attempts to connect a local host at time i let Xi = 0 if the connection attempt is a success, 1 if failed connection As outcomes X1, X2, are observed we wish to determine whether host is a scanner or not Two competing hypotheses: H0: remote host is benign: P [X i =1 H 0 ]=0.1 H1: remote host is a scanner: P [X i =1 H 1 ]=0.8 Calculate the cumulative sum of the log-likelihood ratio S 0 =0 S i = S i 1 + log P [X i H 0 ] log P [X i H 1 ] 8

9 Sn Cumulative Likelihood ratio Threshold a 0 Stopping time (N) Threshold b False alarm rate = P [R =1 H 0 ] Misdetection rate = P [R =0 H 1 ]

10 10

Paul the Octopus (2008-2010) Paul was an animal oracle Paul's keepers would present him with two

11 Paul the Octopus ( ) Paul was an animal oracle Paul's keepers would present him with two boxes containing food Whichever teams is in the box Paul chooses first is the predicted winner 11

) Xi = ( 1, if Paul predicts correct outcome 0, otherwise Variable of interest: Y13 = 13 X Xi i=1 Hypothesis happened by

12 Hypothesis Testing Paul the Octopus as an Oracle Random variable (i.i.d.) Xi = ( 1, if Paul predicts correct outcome 0, otherwise Variable of interest: Y13 = 13 X Xi i=1 Hypothesis happened by chance REJECTED! What is the Null Hypothesis? Paul is not an animal oracle Mathematical definition? H0 := P[Xi = 1] = p = 0.5 ) P [Y13 = k H0 ] = k (1 k 0.5)n k Should we reject H0 with significance level 0.05? (one-sided test) P [Y13 13 X H0 ] = 0.5k (1 k k=11 0.5)n k = <

13 Anything Wrong in our Hypothesis Test? 13

14 Hypothesis Test Possible Outcomes Actual Situation Truth Do Not Reject H 0 Reject H 0 H 0 True P [H 0 H 0 ] 1 Type I error (false positive) P [ H 0 H 0 ] H 0 False Type II error (false negative) P [H 0 H 0 ] P [ H 0 H 0 ] 1 14

15 Xi = 1, if Paul predicts correct outcome 0, otherwise R is random variable that defines if hypothesis is rejected if P [Y13 k H0 ] < 0.05 then R = 1; otherwise R = 0 k correct predictions by animal Binomial Distribution (p=0.5) Binomial Distribution 10 H0 ] = P [R = 1 H0 ] = P [Y Density ( Number of Correct Predictions 12 15

16 16

17 Familywise Error (probability of rejecting a true hypothesis in multiple hypotheses tests) Paul Peter Paloma Philis Probability we reject not an oracle hypothesis of Paul based on chance alone? P [R = 1 H0 ] = Probability we reject not an oracle hypothesis of one or more animals (Paul, Peter, Paloma,Philis) 1 (1 P [R = 1 H0 ])4 = 0.17 P[R=0 H0]4 = Probability we correctly reject all 4 hypotheses 17

Bonferoni s correction Used when there aren t too many hypotheses Tends to be too conservative for large number of hypotheses Paul Peter Paloma Philis Per-hypothesis significance level of m

18 Bonferoni s correction Used when there aren t too many hypotheses Tends to be too conservative for large number of hypotheses Paul Peter Paloma Philis Per-hypothesis significance level of m hypotheses: α/m In our animal oracle example: Old significance level α=0.05 Bonferoni s corrected significance level α'=0.05/4 = Hypothesis test: Paul is not an animal oracle 13 X 13 P [Y13 11 H0 ] = 0.5k (1 0.5)n k = < k k=11 Effect on P [H0 H0 ]? 18

19 False Discovery Rate Often used for large number of tests Bonferoni s correction seeks to ensure that no true hypotheses are rejected Low statistical power for large number of hypotheses (rejects no hypotheses m >> 1) False Discovery Rate: Controls: mp [ H 0 H 0 ] Greater statistical power at expense of more false positives Order p-values of all m tests: p 1 apple p 2 apple apple p m Holm s Method: p i =min((m i + 1)p i, 1) Reject if adjusted p-value < α Benjamini-Hochberg method: Reject j null hypothesis if p j apple j m 19

20 Back to Single (Two-sample) Hypothesis Tests 20

21 Nonparametric Test Procedures Not Related to Population Parameters Example: Probability Distributions, Independence Data Values not Directly Used Uses Ordering of Data Examples: Wilcoxon Rank Sum Test, Komogorov-Smirnov Test 21

22 22

23 Nonparametric Testing of Distributions Two-sample Kolmogorov-Smirnov Test Do X (0) and X (1) come from same underlying distribution? Hypothesis (same distribution) rejected at level p if Sample size correction Confidence interval factor Empirical The K-S test is less sensitive when the differences between curves is greatest at the beginning or the end of the distributions. Works best when distributions differ at center. Wikipedia Good reading: M. Tygert, Statistical tests for whether a given set of independent, identically distributed draws comes from a specified probability density. PNAS

24 24

25 Chi-Squared Test Twitter users have features gender and number of tweets. We want to determine whether gender is related to number of tweets. Use chi-square test for independence 25

26 When to use Chi-Squared test When to use chi-square test for independence: Uniform sampling design Categorical features Population is significantly larger than sample The hypotheses: H 0 : variables are independent H 1 variables are not independent O i = the number of observations of type i e.g. (gender = Male, N > 100 tweets) E i = the expected frequency of type i if H 0 is true e.g. (population_size P[gender = Male]P[N > 100 tweets]) 2 = nx i=1 (O i E i ) 2 E i If H 0 is true χ 2 is distributed as a sum square of random variables ~ N(0,1) 26

27 Example Chi-Squared Test men = c(300, 100, 40) women = c(350, 200, 90) data = as.data.frame(rbind(men, women)) names(data) = c('large', 'med', small') data chisq.test(data) Reject H 0 (p<0.05) means 27

28 Next Class Select 50% users to see headline A Titanic Sinks Select 50% users to see headline B Ship Sinks Killing Thousands If we just care about page visitors we never need to decide which one is best 28

Data Mining. CS57300 Purdue University. March 22, 2018

Data Mining. CS57300 Purdue University. March 22, 2018 Data Mining CS57300 Purdue University March 22, 2018 1 Hypothesis Testing Select 50% users to see headline A Unlimited Clean Energy: Cold Fusion has Arrived Select 50% users to see headline B Wedding War