Non-Parametric Statistics: When Normal Isn t Good Enough"

Non-Parametric Statistics: When Normal Isn t Good Enough" Professor Ron Fricker" Naval Postgraduate School" Monterey, California" 1/28/13 1

A Bit About Me" Academic credentials" Ph.D. and M.A. in Statistics, Yale University" M.S. in Ops Research, George Washington University" B.S. in Ops Analysis, U.S. Naval Academy" (Some) research interests" Biosurveillance how to quickly detect a bioterrorist attack" Survey design and analysis" Quality control and statistical process control" Military manpower and personnel issues: recruiting, retention, effects of deployment" Contact information" Phone: 831-656-3048 (DSN: 756-3048)" E-mail: rdfricker@nps.edu" Web site: http://faculty.nps.edu/rdfricke" 1/28/13 2

Outline" Discuss advantages and disadvantages of nonparametric tests" Describe some nonparametric tests" One sample data" Paired data" Multiple groups" Illustrate application with various real-world datasets" Show how to implement them in Excel, JMP, and R" 1/28/13 3

Parametric vs. Nonparametric Hypothesis Testing" Parametric hypothesis testing: " Statistic distribution are specified (often normal)" Often follows from Central Limit Theorem, but sometimes CLT assumptions don t fit/apply" Perhaps sample size too small or underlying population distribution too skewed" For ANOVA, CLT doesn t apply data is assumed to follow normal distribution" Nonparametric hypothesis testing:" Does not assume a particular probability distribution" Often called distribution free " Generally based on ordering or order statistics" 1/28/13 4

Challenges in Parametric Hypothesis Testing" Sometimes experiments give responses they defy exact quantification" Want to rank the utility of four weapons systems" Gives an ordering, but can be impossible to say things like System A is twice as useful as B " Sometimes distributional assumptions of parametric tests violated" " 1/28/13 Want to compare two LVS maintenance programs, but data clearly non-normal" If the data do not fit the assumptions of the (parametric) tests, what to do? " 5

Advantages of Nonparametric Tests" Nonparametric tests make less stringent demands on the data" E.g., they require fewer assumptions" Usually require independent observations (or independence of paired differences)" Sometimes assumes continuity of the measure" Can be more appropriate:" When measures are not precise" For ordinal data where scale is not obvious" When only ordering of data is available" 1/28/13 6

Disadvantages of Nonparametric Tests" They may throw away information" E.g., Sign test only uses the signs (+ or -) of the data, not the numeric values" If the other information is available and there is an appropriate parametric test, that test will be more powerful" The trade-off: " Parametric tests are more powerful if the assumptions are met! Nonparametric tests provide a more general result if they are powerful enough to reject! 1/28/13 7

(Some) Nonparametric Tests" Parametric Test! One-sample t-test" Two-sample t-test" Paired t-test" " One-way ANOVA" Equivalent Nonparametric Test! Sign test, signed rank test" Sign test, Wilcoxon rank sum test/ Mann-Whitney U test" Sign test, signed rank test" Kruskall-Wallis test" Also:" Friedman test" Bootstrap-based methods" Spearman s rank correlation coefficient" Kolmogorov-Smirnov test for comparing two distributions" McNemar s test" 1/28/13 8

Sign Test Hypotheses and Assumptions" Basic nonparametric test procedure to assess the plausibility of the null" Assumption: Observations x 1,,x n are independent and identically distributed from some unknown distribution F Hypotheses:" H 0 : F(x) = p 0 H a : F(x) p 0 Common to test p 0 = 0.5; i.e., test the median " For symmetric distributions, equivalent to testing the mean" Can also test quartiles or any other percentile" 1/28/13 9

Sign Test: Methodology " First, delete all observations equal to x (and decrement n accordingly)" Calculate null hypothesis is true" and note that if the Test the appropriate alternative hypothesis:" " Alternative Hypothesis" H a : p > p 0 Sx ( ) = # x i x S( X ) ~ Bin(n, p 0 ) p-value" Pr(S( X ) S(x)) = 1-BINOMDIST(S(x) 1,n, p 0,1) H a : p < p 0 Pr(S( X ) S(x)) = BINOMDIST(S(x),n, p 0,1) H a : p p 0 p-value = 2 Pr(S( X ) S(x)), if S(x) > np 0 2 Pr(S( X ) S(x)), if S(x) < np 0 1/28/13 10

Large Sample Approximation" For large n, the binomial can be approximated by the normal: " Requirement: np 0 >5 and n(1-p 0 )>5" Then, for a two sided test, with" " S(x) n F(x)(1 F(x)) ~ N F(x), n = N p, p (1 p ) 0 0 0 n z = S(x) np 0 np 0 (1 p 0 ) p-value 2 Φ( - z ) 1/28/13 11

Example #2: MK-48 LVS Down Time " Number 0 1000 2000 3000 4000 5000 6000 Data set has 9,505 observations" o Clearly not normally distributed" " Mean of all 9,505 obs = 63.25 days and the median is 35 days" Think of this set as the population " 0 200 400 600 800 1000 1/28/13 12 Days Down

Example #2, continued" What if only had 30 observations " Okay to assume mean is normally distributed?" Nope; in this case, n=30 not sufficient for CLT à " Density 0.000 0.005 0.010 0.015 0.020 0.025 Histogram of Means of Random Samples of Size n=30 20 40 60 80 100 120 140 means 1/28/13 13

Example #2, continued" Then let s use the sign test to test whether the median down time is greater than 10 days" Since np 0 =30x0.5=15>5, can do large sample calc:" z = S(x) np 0 np 0 (1 p 0 ) = 21 15 50(0.5)(1 0.5) = 6 2.74 = 2.19 So, p-value Φ( - 2.19) = 0.014 1/28/13 14

Sign Test in Other Software: JMP" p 0 and 1-p 0 " estimated p-value" 1/28/13 15

Sign Test in Other Software: R" Just use the binom.test() function" From previous example:" 1/28/13 16

Sign Test for Paired Observations" Consider n pairs of observations (x i, y i )" x 1,...,x n is a random sample from F, and" y 1,..., y n is a random sample from G, where G( y) = F( y θ) for some unknown value " θ Want to test the hypothesis that the distribution of the Xs and Ys is the same except perhaps for the location:" H : 0 θ = 0 vs. H : 0 a θ Can easily adapt the sign test to this problem" Idea: Define d i = x i - y i. Then under the null hypothesis, the probability that d i is positive is 0.5 1/28/13 17

Location Shift Assumption" G( y) F( y) θ G( y) = F( y θ) y 1/28/13 18

Example #3: INSURV Material Inspections vs. Ship Trials" For 2008-2011 data, test whether there is a location difference in percent SAT for ship trials versus MIs" Solution:" 1/28/13 19

Example #4: AAAV Maintenance Rates at I and III MEF" Test whether the AAAV maintenance rates are different between the two MEFs" Perhaps could use a parametric approach to test the mean maintenance rates:" Normal Probability Plot for I MEF Normal Probability Plot for III MEF Normal Probability Plot for I MEF - III MEF Sample Quantiles 0.0020 0.0025 0.0030 0.0035 0.0040 0.0045-1 0 1 Sample Quantiles 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 0.0040-1 0 1 Sample Quantiles 0.0005 0.0010 0.0015 0.0020 0.0025-1 0 1 Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles 1/28/13 20

Example #4: AAAV Maintenance Rates at I and III MEF" Paired t-test (parametric test) result" Pick your preferred software output" 1/28/13 21

Example #4: AAAV Maintenance Rates at I and III MEF" Sign test confirms I MEF rates higher without having to assume normality" 1/28/13 22

Signed Rank Test Hypotheses and Assumptions" Most often used to make inferences about median of unknown distribution F " Assumptions:" Observations are independent and identically distributed from F" If distribution is symmetric, median = mean and both can be denoted as" µ = F 1 0.5 ( ) H : µ = µ H a : µ µ Hypotheses: " 0 0 0 Signed rank more sensitive than Sign test" It uses the magnitudes of the deviations from the median as well as the signs in its calculations" 1/28/13 23

Signed Rank Test: Methodology (1)" First, calculate deviations from hypothesized mean value: d" i (µ 0 ) = x i µ 0 Compute the ranks of the absolute values of the deviations: r" i (µ 0 ) = rank d i ( ) Delete any observations equal to! If two observations are tied, give them both the average rank" Calculate S + ( µ 0 ), the sum of the ranks of the positive deviations" µ 0 1/28/13 24

Interpretation of the Signed Rank Test Statistic" 1/28/13 25

Signed Rank Test: Methodology (3)" If S + ( µ 0 ) is far from n(n+1)/4, then reject null" For small sample sizes, tables/software necessary to determine far " If sample size is large, use normal approximation to judge far " The test statistic" Z + S + µ 0 = ( ) 0 ( µ ) n( n+ 1)/4 nn ( + 1)(2n+ 1)/24 "is approximately standard normally distributed" 1/28/13 26

Example #2, continued" µ 0 Calcs give S + ( )=407.5 Table: reject at significance level " α < 0.01 Large sample calcs:" ( ) = S + (µ 0 Z + µ 0 ) n(n +1) / 4 n(n +1)(2n +1) / 24 407.5 30 31/ 4 = 30 31 61/ 24 = 3.6 So, p-value Φ( - 3.6) = 0.0002 Conclude median > 10" 1/28/13 27

Signed Rank Variants" Can also use signed rank test for paired data" Do test on differences of the pairs" 1/28/13 28

Signed Rank Test in Other Software: JMP" 1/28/13 29

Signed Rank Test in Other Software: R" Use the wilcox.test() function for two sample tests" 1/28/13 30

Rank Sum Test Hypotheses and Assumptions" One- or two-sided test for the hypotheses of the means of two independent samples" x 1,...,x n is a random sample from F, and" y 1,..., y m is a random sample from G, where G( " y) = " F( y θ) for some unknown value " θ Want to test the hypothesis that the distribution of the Xs and Ys is the same except perhaps for the location:" H : 0 vs. : 0 0 θ = H a θ 1/28/13 31

Idea of the Test" If two samples come from same distribution, the data ought to be mixed in together" If not, likely that one sample of data will be systematically higher or lower than other sample" Use ranks to quantify where data falls" I.e., smallest observation gets rank 1, next smallest gets rank 2, etc. " Sum of the ranks quantifies whether whole sample is systematically different" 1/28/13 32

Calculating the Test Statistic" In JMP, assuming no ties, the test statistic is " " " " "where" "and" Z = S x E(S x ) Var(S x ) E(S x ) = m(m + n +1) / 2, Var(S x ) = mn(m + n +1) / 12 There are other ways to calculate" And there are details for handling ties" We ll just let JMP or R calculate for us " 1/28/13 33

Example #5" Comparison of age distribution of US vs. non- US Iraq war casualties" 4,123 casualties from 3-21-03 to 10-10-07" 3,820 US (n) and 301 non-us (m) " Age Distribution of US Casualties Age Distribution of non-us Casualties Frequency 0 500 1000 1500 Frequency 0 20 40 60 Age (years) 20 30 40 50 60 20 30 40 50 60 20 30 40 50 60 non-us US Age (years) Age (years) 1/28/13 34

Calculating in JMP" Analyze > Fit Y by X" Be sure X is nominal and Y is continuous" Under red triangle > Nonparametric > Wilcoxon Test" Check the calcs:" E(S x ) = m(m + n +1) / 2 = 295(4,112) / 2 = = 606,520 Var(S x ) = mn(m + n +1) / 12 = 295 3,816 4,112 / 12 = 385,746,720 Z = S x E(S x ) Var(S x ) = 778,733.5 606,520 385,746,720 = 8.768314 Conclusion: Location different" 1/28/13 35

Calculating in R" Use the wilcox.test() function with option paired=false! 1/28/13 36

Kruskal-Wallis Test Hypotheses and Assumptions" Generalization of the rank sum test for k > 2 samples" n T observations divided into k groups" Nonparametric alternative to one-way ANOVA" Null hypothesis is that all k samples are from the same population" Assumption: error terms are identically distributed" Need not be a normal distribution" 1/28/13 37

Kruskal-Wallis Test: Methodology" Idea: combine k samples into one large sample, order them, and rank observations from 1 to n T Denote rank of observation x ij as r ij, " 1 i k,1 j ni Further denote the average ranks as where r i = r + + r i1 in i n i r,..., 1 rk 1/28/13 38

Calculate the Test Statistic" " The test statistic is "! " " Calculate p-value as where X has a chi-square distribution with k-1 df" " H = 12 n T (n T +1) k 2 n i r i 3(n T +1) i=1 Pr( X > H) 1/28/13 39

Example #6: Helo Back Pain Survey" Survey of 683 US Navy helo pilots and copilots about back pain in the cockpit" Question 29: How often do you have back and/ or neck pain during or immediately after a flight? (1=Rarely, 5=Very Often)! Compare by question 11: In what type of helicopter do you have the most flight hours? (H-46, H-53, TH-57, H60)" 553 respondents answered both questions:" 1/28/13 40

Kruskall-Wallis Test in JMP" Analyze > Fit Y by X > red triangle > Nonparametric > Wilcoxon Test" Same as Wilcoxon rank sum, just need more than two levels in X " p-value cannot reject null 1/28/13 41

Parametric vs. Nonparametric" One-way ANOVA Kruskal-Wallis Test 1/28/13 42

Calculating in R" Use the kruskal.test() function" No statistical difference by type of helo " 1/28/13 43

Good References" My OA3102 course notes: http:// faculty.nps.edu/rdfricke/oa3102.htm" Hayter, A.J., Probability and Statistics for Engineers and Scientists, Duxbury Press, 2006." Wackerly, D., W. Mendenhall, and R.L. Scheafer, Mathematical Statistics with Applications, Duxbury Press, 2007." Conover, W.J., Practical Nonparametric Statistics, Wiley, 1999." 1/28/13 44

What We Covered " Discussed advantages and disadvantages of nonparametric tests" Described some nonparametric tests" One sample data" Paired data" Multiple groups" Illustrated application with various real-world datasets" Showed how to implement them in Excel, JMP, and R" 1/28/13 45