Intro to Parametric & Nonparametric Statistics

Similar documents
CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Textbook Examples of. SPSS Procedure

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

NON-PARAMETRIC STATISTICS * (

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Nonparametric Statistics

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

NAG Library Chapter Introduction. G08 Nonparametric Statistics

What Are Nonparametric Statistics and When Do You Use Them? Jennifer Catrambone

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Types of Statistical Tests DR. MIKE MARRAPODI

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Review of Statistics 101

Statistics: revision

Non-parametric (Distribution-free) approaches p188 CN

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

Intuitive Biostatistics: Choosing a statistical test

Contents. Acknowledgments. xix

Non-parametric tests, part A:

NAG Library Chapter Introduction. g08 Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

CDA Chapter 3 part II

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Understand the difference between symmetric and asymmetric measures

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

Glossary for the Triola Statistics Series

Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D.

Analysis of variance (ANOVA) Comparing the means of more than two groups

Exam details. Final Review Session. Things to Review

Small n, σ known or unknown, underlying nongaussian

Introduction to Statistical Analysis

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Inferences About the Difference Between Two Means

Rank-Based Methods. Lukas Meier

psychological statistics

Data Analysis: Agonistic Display in Betta splendens I. Betta splendens Research: Parametric or Non-parametric Data?

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Lecture 7: Hypothesis Testing and ANOVA

Non-parametric methods

Basic Statistical Analysis

Basic Business Statistics, 10/e

Non-Parametric Statistics: When Normal Isn t Good Enough"

Note: k = the # of conditions n = # of data points in a condition N = total # of data points

Module 9: Nonparametric Statistics Statistics (OA3102)

Analyzing Small Sample Experimental Data

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

Bivariate Relationships Between Variables

What is a Hypothesis?

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

N Utilization of Nursing Research in Advanced Practice, Summer 2008

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

Stat 101 Exam 1 Important Formulas and Concepts 1

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Unit 14: Nonparametric Statistical Methods

Data Analysis as a Decision Making Process

Psych 230. Psychological Measurement and Statistics

Workshop Research Methods and Statistical Analysis

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

STAT Section 2.1: Basic Inference. Basic Definitions

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Practical Biostatistics

My data doesn t look like that..

Background to Statistics

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

Chapter 14. One-Way Analysis of Variance for Independent Samples Part 2

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

AP Statistics Cumulative AP Exam Study Guide

Distribution-Free Procedures (Devore Chapter Fifteen)

= 1 i. normal approximation to χ 2 df > df

Discrete Multivariate Statistics

STAT Section 5.8: Block Designs

Biostatistics 270 Kruskal-Wallis Test 1. Kruskal-Wallis Test

H0: Tested by k-grp ANOVA

13: Additional ANOVA Topics. Post hoc Comparisons

Physics 509: Non-Parametric Statistics and Correlation Testing

ANOVA - analysis of variance - used to compare the means of several populations.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Chi-Square. Heibatollah Baghi, and Mastee Badii

Statistics and Measurement Concepts with OpenStat

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Introduction to the Analysis of Variance (ANOVA)

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

A Re-Introduction to General Linear Models

Fish SR P Diff Sgn rank Fish SR P Diff Sng rank

Transcription:

Kinds of variable The classics & some others Intro to Parametric & Nonparametric Statistics Kinds of variables & why we care Kinds & definitions of nonparametric statistics Where parametric stats come from Consequences of parametric assumptions Organizing the statistics & models we will cover in this class Common arguments for using nonparametric stats Common arguments against using nonparametric stats Using ranks instead of values to compute statistics Labels aka identifiers values may be alphabetic, numeric or symbolic different data values represent unique vs. duplicate cases, trials, or events e.g., UNL ID# Nominal aka categorical, qualitative values may be alphabetic, numeric or symbolic different data values represent different kinds e.g., species Ordinal aka rank order data, ordered data, seriated data values may be alphabetic or numeric different data values represent different amounts only trust the ordinal information in the value don t trust the spacing or relative difference information has no meaningful 0 don t trust ratio or proportional information e.g., 10 best cities to live in has ordinal info 1 st is better than 3 rd no interval info 1 st & 3 rd not as different as 5 th & 7 th no interval info 1st & 5rd not more different than 5th & 7th no ratio info no 0 th place no prop info 2 nd not twice as good as 4 th no prop dif info 1 st & 5 th not twice as different as 1 st & 3 rd 1

Interval aka numerical, equidistant values values are numeric different data values represent different amounts all intervals of a given extent represent the same difference anywhere along the continuum trust the ordinal information in the value trust the spacing or relative difference information has no meaningful 0 (0 value is arbitrary) don t trust ratio or proportional information e.g., # correct on a 10-item spelling test of 20 study words has ordinal info 8 is better than 6 has interval info 8 & 6 are as different as 5 & 3 has prop dif info 2 & 6 are twice as different as 3 & 5 no ratio info 0 not mean can t spell any of 20 words no proportional info 8 not twice as good as 4 20 30 40 50 60 Ordinal Measure Positive monotonic trace more means more but doesn t tell how much more 20 30 40 50 60 Interval Measure Linear trace more tells how much more but only difference not proportion y = mx + c Nearly Interval Scale good summative scales how close is close enough 20 30 40 50 60 20 30 40 50 60 Limited Interval Scale provides interval data only over part of the possible range of the scale values / construct summative/aggregated scales 2

Binary Items Nominal for some constructs different values mean different kinds e.g., male = 1 famale = 2 Ordinal for some constructs can rank-order the categories e.g., fail = 0 pass = 1 Interval only one interval, so all intervals of a given extent represent the same difference anywhere along the continuum So, you will see binary variables treated as categorical or numeric, depending on the research question and statistical model. Ratio aka numerical, real numbers values are numeric different data values represent different amounts trust the ordinal information in the value trust the spacing or relative difference information has a meaningful 0 trust ratio or proportional information e.g., number of treatment visits has ordinal info 9 is better than 7 has interval info 9 & 6 are as different as 5 & 2 has prop dif info 2 & 6 are twice as different as 3 & 5 has ratio info 0 does mean didn t visit has proportional info 8 is twice as many as 4 Ratio Measure Pretty uncommon in Psyc & social sciences tend to use arbitrary scales usually without a zero 20 5-point items 20-100 0 20 40 0 10 20 30 40 50 60 Linear scale & 0 means none Linear trace w/ 0 more how much more y = mx + c 3

Kinds of variables Why we care Reasonable mathematical operations Nominal = Ordinal < = > Interval < = > + - (see note below about * / ) Data Distributions We often want to know the shape of a data distribution. Nominal can t do no prescribed value order dogs cats fish rats vs. fish cats dogs rats Ordinal can t do well prescribed order but not spacing Ratio < = > + - * / Note: For interval data we cannot * or / numbers, but can do so with differences. E.g., while 4 can not be said to be twice 2, 8 & 4 are twice as different as are 5 & 3. 10 20 30 40 50 60 10 20 30 40 50 60 Interval & Ratio prescribed order and spacing 10 20 30 40 50 60 There are two kinds of statistics commonly referred to as nonparametric... Statistics for quantitative variables w/out making assumptions about the form of the underlying data distribution univariate stats -- median & IQR univariate stat tests -- 1-sample test of median bivariate -- analogs of the correlations, t-tests & ANOVAs Statistics for qualitative variables univariate -- mode & #categories univaraite stat tests -- goodness-of-fit X² bivariate -- Pearson s Contingency Table X² Have to be careful!! X² tests are actually parametric (they assume an underlying normal distribution more later) 4

Defining nonparametric statistics... Nonparametric statistics (also called distribution free statistics ) are those that can describe some attribute of a population, test hypotheses about that attribute, its relationship with some other attribute, or differences on that attribute across populations, across time or across related constructs, that require no assumptions about the form of the population data distribution(s) nor require interval level measurement. Now, about that last part that require no assumptions about the form of the population data distribution(s) nor require interval level measurement. This is where things get a little dicey. Today we get just a taste, but we will examine this very carefully after you know the relevant models Most of the statistics you know have a fairly simple computational formula. As examples... Here are formulas for two familiar parametric statistics: The mean... M = X / N The standard S = ( X - M ) 2 deviation... N But where to these formulas come from??? As you ve heard many times, computing the mean and standard deviation assumes the data are drawn from a population that is normally distributed. What does this really mean??? 5

formula for the normal distribution: e -( x - )² / 2 ² ƒ(x) = -------------------- 2π For a given mean ( ) and standard deviation ( ), plug in any value of x to receive the proportional frequency of that value in that particular normal distribution. First Since the computational formulas for the mean and the std are derived based upon the assumption that the normal distribution formula describes the data distribution if the data are not normally distributed then the formulas for the mean and the std don t provide a description of the center & spread of the population distribution. The computational formula for the mean and std are derived from this formula. Same goes for all the formulae that you know!! Pearson s corr, Z-tests, t-tests, F-tests, X 2 tests, etc.. Second Since the computational formulas for the mean and the std use +, -, * and /, they assume the data are measured on an interval scale (such that equal differences anywhere along the measured continuum represent the same difference in construct values, e.g., scores of 2 & 6 are equally different than scores of 32 & 36) if the data are not measured on an interval scale then the formulas for the mean and the std don t provide a description of the center & spread of the population distribution. Same goes for all the formulae that you know!! Pearson s corr, Z-tests, t-tests, F-tests, X 2 tests, etc.. 6

Normally distributed data Organizing nonparametric statistics... 1-sample Z tests Known σ Z scores Known σ 2-sample Z tests Known σ X 2 tests ND 2 df = k-1 or (k-1)(j-1) rtests bivnd Nonparametric statistics (also called distribution free statistics ) are those that can describe some attribute of a population,, test hypotheses about that attribute, its relationship with some other attribute, or differences on that attribute across populations, across time, or across related constructs, that require no assumptions about the form of the population data distribution(s) nor require interval level measurement. describe some attribute of a population univariate stats 1-sample t tests Estimated σ df = N-1 2-sample t tests Estimated σ df = N-1 F tests X 2 / X 2 df = k 1 & N-k test hypotheses about that attribute its relationship with some other attribute differences on that attribute across populations across time, or across related constructs univariate statistical tests tests of association between groups comparisons within-groups comparisons Statistics We Will Consider Parametric Nonparametric DV Categorical Interval/ND Ordinal/~ND univariate stats mode, #cats mean, std median, IQR univariate tests gof X 2 1-grp t-test 1-grp Mdn test association X 2 Pearson s r Spearman s r Kendall s Tau 2 bg X 2 t- / F-test M-W K-W Mdn k bg X 2 F-test K-W Mdn 2wg McNem Crn s t- / F-test Wil s Fried s kwg Crn s F-test Friedman s M-W -- Mann-Whitney U-Test Wil s -- Wilcoxin s Test Fried s -- Friedman s F-test K-W -- Kruskal-Wallis Test Mdn -- Median Test McNem -- McNemar s X 2 Crn s Cochran s Test 7

Things to notice X 2 is used for tests of association between categorical variables & for between groups comparisons with a categorical DV These WG-comparisons can only be used with binary DVs k-condition tests can also be used for 2-condition situations Common reasons/situations FOR using Nonparametric stats & a caveat to consider Data are not normally distributed r, Z, t, F and related statistics are rather robust to many violations of these assumptions Data are not measured on an interval scale. Most psychological data are measured somewhere between ordinal and interval levels of measurement. The good news is that the regular stats are pretty robust to this influence, since the rank order information is the most influential (especially for correlation-type analyses). Sample size is too small for regular stats Do we really want to make important decisions based on a sample that is so small that we change the statistical models we use? Remember the importance of sample size to stability. Common reasons/situations AGAINST using Nonparametric stats & a caveat to consider Robustness of parametric statistics to most violated assumptions Difficult to know if the violations or a particular data set are enough to produce bias in the parametric statistics. One approach is to show convergence between parametric and nonparametric analyses of the data. Poorer power/sensitivity of nonpar statistics (make Type II errors) Parametric stats are only more powerful when the assumptions upon which they are based are well-met. If assumptions are violated then nonpar statistics are more powerful. Mostly limited to uni- and bivariate analyses Most research questions are bivariate. If the bivariate results of parametric and nonparametric analyses converge, then there may be increased confidence in the parametric multivariate results. 8

continued Not an integrated family of models, like GLM There are only 2 families -- tests based on summed ranks and tests using 2 (including tests of medians), most of which converge to Z-tests in their large sample versions. H0:s not parallel with those of parametric tests This argument applies best to comparisons of groups using quantitative DVs. For these types of data, although the null is that the distributions are equivalent (rather than that the centers are similarly positioned H0: for t-test and ANOVA), if the spread and symmetry of the distributions are similar (as is often the case & the assumption of t-test and ANOVA), then the centers (medians instead of means) are what is being compared by the significance tests. In other words, the H0:s are similar when the two sets of analyses make the same assumptions. Working with Ranks instead of Values All of the nonparametric statistics for use with quantitative variables work with the ranks of the variable values, rather than the values themselves. S# score rank 1 12 2 20 3 12 4 10 5 17 6 8 3.5 6 3.5 2 5 1 Why convert values to ranks? Converting values to ranks smallest value gets the smallest rank highest rank = number of cases tied values get the mean of the involved ranks cases 1 & 3 are tied for 3rd & 4th ranks, so both get a rank of 3.5 Because distributions of ranks are better behaved than are distributions of values (unless there are many ties). 9