Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Similar documents
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Textbook Examples of. SPSS Procedure

NON-PARAMETRIC STATISTICS * (

Non-parametric methods

Non-parametric tests, part A:

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Non-parametric (Distribution-free) approaches p188 CN

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Nonparametric Statistics

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

3. Nonparametric methods

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Basic Business Statistics, 10/e

Inferential Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Module 9: Nonparametric Statistics Statistics (OA3102)

Unit 14: Nonparametric Statistical Methods

Contents. Acknowledgments. xix

Types of Statistical Tests DR. MIKE MARRAPODI

What Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone

Intro to Parametric & Nonparametric Statistics

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

ANOVA - analysis of variance - used to compare the means of several populations.

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Statistical Inference Theory Lesson 46 Non-parametric Statistics

16. Nonparametric Methods. Analysis of ordinal data

BIO 682 Nonparametric Statistics Spring 2010

Inferences About the Difference Between Two Means

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Statistics Handbook. All statistical tables were computed by the author.

Rank-Based Methods. Lukas Meier

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

CDA Chapter 3 part II

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

ST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data

Analysis of variance (ANOVA) Comparing the means of more than two groups

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

N Utilization of Nursing Research in Advanced Practice, Summer 2008

STATISTIKA INDUSTRI 2 TIN 4004

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

What is a Hypothesis?

= 1 i. normal approximation to χ 2 df > df

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Chapter 18 Resampling and Nonparametric Approaches To Data

Small n, σ known or unknown, underlying nongaussian

NONPARAMETRIC TESTS. LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-12

Lecture 7: Hypothesis Testing and ANOVA

Non-Parametric Statistics: When Normal Isn t Good Enough"

Intuitive Biostatistics: Choosing a statistical test

psychological statistics

Nonparametric Statistics Notes

Understand the difference between symmetric and asymmetric measures

Transition Passage to Descriptive Statistics 28

Statistical. Psychology

Nonparametric Methods

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

NAG Library Chapter Introduction. G08 Nonparametric Statistics

Slides by. John Loucks. St. Edward s University. Slide South-Western, a part of Cengage Learning

Introduction to Biostatistics: Part 5, Statistical Inference Techniques for Hypothesis Testing With Nonparametric Data

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Data Analysis: Agonistic Display in Betta splendens I. Betta splendens Research: Parametric or Non-parametric Data?

Analyzing Small Sample Experimental Data

Statistics: revision

Statistics and Measurement Concepts with OpenStat

Non-parametric Tests

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Glossary for the Triola Statistics Series

Basic Statistical Analysis

ST4241 Design and Analysis of Clinical Trials Lecture 9: N. Lecture 9: Non-parametric procedures for CRBD

TMA4255 Applied Statistics V2016 (23)

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

Spearman Rho Correlation

Inference About Means and Proportions with Two Populations

Biostatistics 270 Kruskal-Wallis Test 1. Kruskal-Wallis Test

Contents 1. Contents

Inferential statistics

1 ONE SAMPLE TEST FOR MEDIAN: THE SIGN TEST

Review of Statistics 101

Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test

Kumaun University Nainital

Introduction to Nonparametric Statistics

Introduction to Business Statistics QM 220 Chapter 12

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Non-parametric Statistics

Exam details. Final Review Session. Things to Review

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

Everything is not normal

Frequency Distribution Cross-Tabulation

Solutions exercises of Chapter 7

Transcription:

Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender, ethnic background 2. Ordinal or Ranking Scale Hardness of rocks, beauty, military ranks 3. Interval Scale Celsius or Fahrenheit 4. Ratio Scale speed, height, mass or weight

Parametric Assumptions The observations must be independent The observations must be drawn from normally distributed populations These populations must have the same variances

Introduction The theory upon which the twosample T-test is based requires that the two sampled populations be normal and have equal variances. Many other common statistical procedures have similar assumptions.

Introduction A large body of statistical methods is available that comprises procedures that not requiring the estimation of the population variance and mean and not stating hypothesis about parameters. These testing procedures are termed non-parametric tests

Introduction Non parametric tests may be applied in any situation where we would be justified in employing a parametric test, such as the two-sample t test, as well as in instances when the assumptions of the latter are untenable.

Introduction If either the parametric or nonparametric approach is applicable, then the former will always be more powerful than the latter.

Why use a non-parametric statistics? Very small samples (< 20 replicates) high probability of violating the assumption of normality leads to spurious Type 1 (false alarm) errors Outlier more often leads to spurious Type 1 errors in parametric statistics Non-parametric statistics reduce data to an ordinal rank, which reduce the impact or leverage of outlier

Error Type I error: False alarm for a bogus effect Reject the null hypothesis when it is really true Type II error: Miss a real effect Fail to reject the null hypothesis when it is really false Type III error ;) Lazy, incompetence, or willful ignorance of the truth

Nonparametric Assumptions Observations are independent Variable under study has underlying continuity

Nonparametric Methods There is at least one nonparametric test equivalent to a parametric test These tests fall into several categories Tests of differences between groups (independent samples) Tests of differences between variables (dependent samples) Tests of relationships between variables

Nonparametric Methods Sign Test Wilcoxon Signed-Rank Test Mann-Whitney-Wilcoxon Test Kruskal-Wallis Test Rank Correlation Adapted from JOHN S. LOUCKS St. Edward s University

Sign Test A common application of the sign test involves using a sample of n potential customers to identify a preference for one of two brands of a product. The objective is to determine whether there is a difference in preference between the two items being compared.

Sign Test To record the preference data, we use a plus sign if the individual prefers one brand and a minus sign if the individual prefers the other brand. Because the data are recorded as plus and minus signs, this test is called the sign test.

Example: Hand Cream Test Sign Test: Large-Sample Case o As part of a market research study, a sample of 36 consumers were asked to taste two brands of hand cream and indicate a preference o Do the data shown below indicate a significant difference in the consumer preferences for the two brands?

Example: Hand cream Test 18 preferred L Occitane (+ sign recorded) 12 preferred Bath & Body ( _ sign recorded) 6 had no preference The analysis is based on a sample size of 18 + 12 = 30 Hypotheses H 0 : No preference for one brand over the other exists H a : A preference for one brand over the other exists

Example: Hand cream Test Rejection Rule Using 0.05 level of significance, Reject H 0 if z < -1.96 or z > 1.96 Test Statistic z = (18-15)/2.74 = 3/2.74 = 1.095 Conclusion Do not reject H 0. There is insufficient evidence in the sample to conclude that a difference in preference exists for the two brands of hand cream. Fewer than 10 or more than 20 individuals would have to have a preference for a particular brand in order for us to reject H 0.

Wilcoxon Signed-Rank Test The methodology of the parametric matched-sample analysis requires: interval data, and the assumption that the population of differences between the pairs of observations is normally distributed If the assumption of normally distributed differences is not appropriate, the Wilcoxon signed-rank test can be used.

Wilcoxon Signed-Rank Test Preliminary Steps of the Test Compute the differences between the paired observations Discard any differences of zero Rank the absolute value of the differences from lowest to highest Tied differences are assigned the average ranking of their positions Give the ranks the sign of the original difference in the data Sum the signed ranks... next determine whether the sum is significantly different from zero

Example: Express Deliveries Wilcoxon Signed-Rank Test A huge animal hospital has decided to select one of two express delivery services. To test the delivery times of the two services, the Vet sends two reports to a sample of 10 district animal clinics, with one report carried by one service and the other report carried by the second service. Do the data (delivery times in hours) indicate a difference in the two services?

Example: Express Deliveries District clinic Overnight NiteFlite Seattle 32 hrs. 25 hrs. Los Angeles 30 24 Boston 19 15 Cleveland 16 15 New York 15 13 Houston 18 15 Atlanta 14 15 St. Louis 10 8 Milwaukee 7 9 Denver 16 11

Example: Express Deliveries District clinic Differ Diff Rank Sign Rank Seattle 7 10 +10 Los Angeles 6 9 +9 Boston 4 7 +7 Cleveland 1 1.5 +1.5 New York 2 4 +4 Houston 3 6 +6 Atlanta -1 1.5-1.5 St. Louis 2 4 +4 Milwaukee -2 4-4 Denver 5 8 +8 +44

Example: Express Deliveries Hypotheses H 0 : The delivery times of the two services are the same; neither offers faster service than the other H a : Delivery times differ between the two services; recommend the one with the smaller times

Example: Express Deliveries Rejection Rule Using 0.05 level of significance, Reject H 0 if z < -1.96 or z > 1.96 Test Statistic z = (T - T )/ T = (44-0)/19.62 = 2.24 Conclusion Reject H 0. There is sufficient evidence in the sample to conclude that a difference exists in the delivery times provided by the two services. Recommend using the NiteFlite service

Kruskal-Wallis Test The MWN test can be used to test whether two populations are identical The MWW test has been extended by Kruskal and Wallis for cases of three or more populations The Kruskal-Wallis test can be used with ordinal data, interval or ratio data Not require the assumption of normally distributed populations The hypotheses are: H 0 : All populations are identical H a : Not all populations are identical

Mann-Whitney U Test

Two-sample rank test Although nonparametric procedures have been proposed for testing differences between the dispersion, or variability, of two populations, none has achieved widespread acceptance.

Differences between independent groups Two samples compare mean value for some variable of interest Parametric test T-test for independent samples Non-parametric test Wald-Wolfowitz runs test Mann-Whitney U test Kolmogorov-Smirnov two sample test

Mann-Whitney U Test For this test, as for many other nonparametric procedures, the actual measurements are not employed, but use instead the ranks of the measurements. The data may be ranked either from the highest to lowest or from the lowest to the highest values.

Mann-Whitney U Test Nonparametric alternative to twosample t-test Actual measurements not used ranks of the measurements used Data can be ranked from highest to lowest or lowest to highest values Calculate Mann-Whitney U statistic (for one sided) U = n 1 n 2 + n 1 (n 1 +1) R 1 2

Mann-Whitney U Test Calculate Mann-Whitney U statistic (two sided) U = n1n2+n1(n1+1) R1 U'= n1n2-u 2 n1 and n2 are the number of observations in Sample one and two R1 is the sum of the ranks of the observations in Sample one

Mann-Whitney U Test Calculate Mann-Whitney U statistic (two sided) U'= n2n1+n2(n2+1) R2 U= n1n2-u' 2 n1 and n2 are the number of observations in Sample one and two R2 is the sum of the ranks of the observations in Sample two

Example of Mann-Whitney U test Two tailed null hypothesis that there is no difference between the heights of male and female students Ho: Male and female students are the same height HA: Male and female students are not the same height

Example 1 U 0.05(2),7,5 = U 0.05(2),5,7 = 30 As 33 > 30, Ho is rejected U = n1n2 + n1(n1+1) R1 2 0.01 < P (U >= 33 or U =< 2) < 0.02 U=(7)(5) + (7)(8) 30 2 U = 35 + 28 30 Heights of males (cm) Heights of females (cm) Ranks of male heights Ranks of female heights U = 33 U = n1n2 U U = (7)(5) 33 U = 2 193 175 1 7 188 173 2 8 185 168 3 10 183 165 4 11 180 163 5 12 178 6 170 9 n 1 = 7 n 2 = 5 R 1 = 30 R 2 = 48

Calculation for z-statistics E(U) = (n1n2)/2 =(7*5)/2=17.5 S(U) = n1n2(n1+n2+1)/12 = 7*5*(7+5+1)/12 = 6.16 z = [U-E(U)]/S(U) = [(2-17.5)/6.16 = -2.516

Rejection Rule Using 0.05 level of significance, Reject H 0 if z < -1.96 or z > 1.96 Conclusion Can reject H 0. There is significantly difference between the heights of male and female students..

Example of Mann-Whitney U test Ho: The performance of students is the same under the two teaching assistants Ha: Students do not perform equally well under the two teaching assistants = 0.05

Teaching Assistant A Teaching Assistant B Example 2 Grade A A Rank of grade Grade A A A B+ A- B+ B B B- C+ C C+ C C C- C B D C- D D D D- Rank of grade n 1 = 11 R 1 = n 2 = 14 R 2 =

Example 2 Teaching Assistant A Teaching Assistant B U = n 1 n 2 + n 1 (n 1 +1) R 1 2 U=(11)(14) + (11)(12) 114.5 2 U = 154 + 66 114.5 U = 105.5 U = n 1 n 2 U U = (11)(14) 105.5 U = 48.5 U 0.05(2),11,14 = 114 As < 114, accept H 0 0.10 < P (U >105.5 or U =< 48.5) < 0.20 Grade Rank of grade Grade Rank of grade A 3 A 3 A 3 A 3 A 3 B+ 7.5 A- 6 B+ 7.5 B 10 B 10 B 10 B- 12 C+ 13.5 C 16.5 C+ 13.5 C 16.5 C 16.5 C- 19.5 C 16.5 D 22.5 C- 19.5 D 22.5 D 22.5 D 22.5 D- 25 n 1 = 11 R 1 =114.5 n 2 = 14 R 2 =210.5

Calculation for z-statistics E(U) = (n1n2)/2 = 77 S(U) = n1n2(n1+n2+1)/12 = 18.27 z = [U-E(U)]/S(U) = [(48.5-77)/18.27] = -1.56

Rejection Rule Using 0.05 level of significance, Reject H 0 if z < -1.96 or z > 1.96 Conclusion Can not reject H 0. The performance of students is the same under the two teaching assistants.

Differences between independent groups Multiple groups Multiple groups Parametric Analysis of variance (ANOVA/ MANOVA) Nonparametric Kruskal-Wallis analysis of ranks Median test

Differences between dependent groups Compare two variables measured in the same sample If more than two variables are measured in same sample Parametric t-test for dependent samples Repeated measures ANOVA Nonparametric Sign test Wilcoxon s matched pairs test Friedman s two way analysis of variance Cochran Q

Relationships between variables Two variables of interest are categorical Parametric Correlation coefficient Nonparametric Spearman R Kendall Tau Coefficient Gamma Chi square Phi coefficient Fisher exact test Kendall coefficient of concordance

Summary Table of Statistical Tests Level of Measurement Sample Characteristics Correlation 1 Sample 2 Sample K Sample (i.e., >2) Independent Dependent Independent Dependent Categorical or Nominal Χ 2 or binomial Χ 2 Macnarmar s Χ 2 Χ 2 Cochran s Q Rank or Ordinal Mann Whitney U Wilcoxin Matched Pairs Signed Ranks Kruskal Wallis H Friendman s ANOVA Spearman s rho Parametric (Interval & Ratio) z test or t test t test between groups t test within groups 1 way ANOVA between groups 1 way ANOVA (within or repeated measure) Pearson s r Factorial (2 way) ANOVA

Advantages of Nonparametric Tests Probability statements obtained from most nonparametric statistics are exact probabilities, regardless of the shape of the population distribution from which the random sample was drawn If sample sizes as small as N=6 are used, there is no alternative to using a nonparametric test

Advantages of Nonparametric Tests Treat samples made up of observations from several different populations. Can treat data which are inherently in ranks as well as data whose seemingly numerical scores have the strength in ranks They are available to treat data which are classificatory Easier to learn and apply than parametric tests

Criticisms of Nonparametric Procedures Losing precision/wasteful of data Low power False sense of security Lack of software Testing distributions only Higher-ordered interactions not dealt with

A good tree will bear good fruits