My data doesn t look like that..

Similar documents
Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Rank-Based Methods. Lukas Meier

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

REVIEW 8/2/2017 陈芳华东师大英语系

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

Statistics: revision

Exam details. Final Review Session. Things to Review

Turning a research question into a statistical question.

Textbook Examples of. SPSS Procedure

Non-parametric (Distribution-free) approaches p188 CN

Non-parametric tests, part A:

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Physics 509: Bootstrap and Robust Parameter Estimation

Introduction to Statistical Analysis

Business Statistics. Lecture 9: Simple Regression

3. Nonparametric methods

Lecture 7: Hypothesis Testing and ANOVA

3 Joint Distributions 71

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Introduction to Nonparametric Statistics

Nonparametric Statistics

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

Contents. Acknowledgments. xix

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Non-Parametric Statistics: When Normal Isn t Good Enough"

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Data analysis and Geostatistics - lecture VII

Practical Statistics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Comparison of two samples

Confidence intervals CE 311S

A Re-Introduction to General Linear Models

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

What is a Hypothesis?

Inferences for Regression

A Re-Introduction to General Linear Models (GLM)

Intuitive Biostatistics: Choosing a statistical test

THE PEARSON CORRELATION COEFFICIENT

COMPARING SEVERAL MEANS: ANOVA

Nonparametric tests, Bootstrapping

Ch. 16: Correlation and Regression

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Inferential Statistics

Correlation and Regression

Rama Nada. -Ensherah Mokheemer. 1 P a g e

appstats27.notebook April 06, 2017

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Distribution-Free Procedures (Devore Chapter Fifteen)

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

SPSS Guide For MMI 409

Kumaun University Nainital

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

Everything is not normal

ANOVA - analysis of variance - used to compare the means of several populations.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Correlation and Linear Regression

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Fisheries, Population Dynamics, And Modelling p. 1 The Formulation Of Fish Population Dynamics p. 1 Equilibrium vs. Non-Equilibrium p.

DATA IN SERIES AND TIME I. Several different techniques depending on data and what one wants to do

Power and nonparametric methods Basic statistics for experimental researchersrs 2017

Lectures on Simple Linear Regression Stat 431, Summer 2012

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Course Review. Kin 304W Week 14: April 9, 2013

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Selection should be based on the desired biological interpretation!

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

A discussion on multiple regression models

Hypothesis testing. Data to decisions

Nonparametric Statistics Notes

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

ST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data

N Utilization of Nursing Research in Advanced Practice, Summer 2008

4.1. Introduction: Comparing Means

Basic Business Statistics 6 th Edition

Lecture 10: F -Tests, ANOVA and R 2

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Transition Passage to Descriptive Statistics 28

Biostatistics 270 Kruskal-Wallis Test 1. Kruskal-Wallis Test

Section 3: Simple Linear Regression

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Data Analysis: Agonistic Display in Betta splendens I. Betta splendens Research: Parametric or Non-parametric Data?

Descriptive Statistics (And a little bit on rounding and significant digits)

Relating Graph to Matlab

Unit 14: Nonparametric Statistical Methods

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

Binary Logistic Regression

Intro to Linear Regression

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Chapter 27 Summary Inferences for Regression

14.30 Introduction to Statistical Methods in Economics Spring 2009

Sociology 593 Exam 2 Answer Key March 28, 2002

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute

Transcription:

Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing model assumptions each week. We test if the data are NORMALL DISTRIBUTED We test if the variances are HOMOGENOUS We have made a big deal about testing model assumptions each week. We test if the data are NORMALL DISTRIBUTED We test if the variances are HOMOGENOUS If our data doesn t meet these assumptions Or we change our data to something else? 1

Nonparametic statistics Nonparametic statistics What are parametric statistics? This is what we have been talking about the last few weeks Parametric stats assume that the distributions of the variables being assessed belong to known parameterized families of probability distributions. ANOVA assume that the underlying distributions are normal and that the variances of the distributions being compared are similar. What are nonparametric statistics? Non-parametric statistics make no assumptions are the frequency distributions of the variables being assessed. They do make some assumptions in some cases, like the data are independent, and sometimes that the variances are equal need to check for each test This does not mean that the models lack parameters Nonparametic statistics Nonparametic statistics Does this solve all my problems? Does this solve all my problems? Nope

Nonparametic statistics Does provide additional options for inferential testing. Power is lower than equivalent parametric test Many common parametric tests have nonparametric analogs Chi-square, Mann-Whitney, Kruskal-Wallis, - sample Kolmogorov-Smirnov test are all common nonparametric tests These test are all readily found in most stats programs Chi-square χ Chi-square Commonly used in comparing the observed vs. expected frequencies of occurrence The test is often called Pearson s Chi-square Chi-square Chi-square This test is an approximation of a chisquare distribution There are a variety of chi-square test Null hypothesis is that the relative frequencies of occurrence of observed events follow a specified frequency distribution Calculated by finding the difference between each observed and expected frequency, squaring them, dividing by the expected frequency, and summing the results 3

Chi-square Chi-square O i observed E i expected Example: Test the hypothesis that a random sample of 1 people has been drawn from a population in which men and women are equal in frequency ( 5 men and 5 women) ou go out and collect data and find that there are 45 men and 55 women in your sample Chi-square Chi-square OK so the test value is 1 Go to a chi-square table (can use Excel chidist function) and then look up the chisquare value where the test statistic value is 1 and the df is also one (df groups -1). Probability of observing a value of 1 or more if men and women are equally numerous in the population is about.3 Is this higher or lower than you re a priori alpha value? 4

Chi-square Wilcoxon sign-rank test This chi-square approximation does not hold if the expected frequencies are low. General rule of thumb is that no more than 1% of the events have expected frequencies below 5. If the df is 1, expected frequencies should be more than 1 Nonparametric alternative to the paired Student s t-test or one-way ANOVA Can be used for two related samples or repeated measurements on a single sample Data might not be normal, but might be symmetric Wilcoxon sign-rank test Wilcoxon sign-rank test Often used to test before and after an experimental manipulation Null hypothesis is that the difference in the response is zero Two samples of measurements of the legs of the front right leg of two samples of bugs (table is only a portion of the data) Rank the data from the two samples (combined) Average ranks for ties Sample A n116 14 19 11 114 116 118 Sum of Ranks Rank (R) 7 9 1 11.5 13.5 59.5 Sample B n 1 1 15 17 17 18 111 13.7 Sokal and Rohlf Rank (R) 1 3 4.5 4.5 6 8 91.5 5

Wilcoxon sign-rank test n( n + 1) + n C n n R 1 13.5 n1 (16) is the size of the larger sample and n (1) is the size of the smaller Look up value in table. Samples are significantly different Kruskal-Wallis Intuitively identical to a one-way ANOVA with the data replaced by the ranks Does not assume normality, but does assume population variation among groups are equal Works for multiple groups Kolmogorov-Smirnov (K-S Test) Kolmogorov-Smirnov (K-S Test) Used to determine whether two underlying probability distributions differ Two sample KS test one of the most useful and general nonparametric methods for comparing two samples Almost too good a test Nearly always significant at high sample sizes Sensitive to both location and shapes of the two distributions being tested Be cautious in its use 6

BREAK!!!!!!!!!!!!!!! Disclaimer The remaining lecture is not traditional hypothesis based statistical testing This lecture is about comparing a description of how the world works (a model) to data that we collect. This lecture is based on material that is not from a stats book but from a fisheries stock assessment literature and books like the Ecological Detective Hmmm. Building models Why do I find myself not using the techniques that we have talked about in this class as much as I once did? What are ecological models, what can they tell us? If I can build a model to describe my data, then compare how well that model fits a second data set, isn t that basically what I m doing with all of these test we have talked about? How do I build or fit models to my data? Models are simplified abstractions of reality Regression, ANOVA, ANCOVA are all models Beverton-Holt stock recruitment curves, Jolly-Seber, are also models 7

Building models Building models To be useful models must mimic the real world Building useful models has two steps Selection of model structure that appears most appropriate to the model at hand Selection of appropriate parameters for this model We have talked a lot about parameters A parameter is a quantitative property of a system that is assumed to remain constant over some defined time span of historical data and future prediction Building models Building models Requirements for parameter estimation: A formal model with parameters to be estimated Data to be used to estimate the parameters A way to judge the goodness of fit to the data of the model and parameter combinations Estimating parameters involves finding values that provide the best fit between the model and the available data according to the criterion Confrontation of models with data the Ecological Detective 8

Building models Building models Finding parameters that fit does not imply that the model will make correct predictions or that there is only one combination of parameters that will fit Good fit can be obtained with incorrect parameters Model or data structure can be wrong We ve been building models all semester i α + βx Simple model. Simple model. i α + βx i α + βx What is this? What is this? Linear regression model Linear because the model terms appear in additive fashion Testing that the slope is equal to zero 9

Simple model. Simple model. i α + βx i α + βx What is this? Linear regression model What is the last term? What is this? Linear regression model What is the last term? The residual- the difference between the value of predicted from the equation and the true value of in the population Simple model. i α + βx How do we fit a regression? We minimize the difference between what we observe and what we predict We minimize the error term Simple model. SS ( obs pred) Sums of squares Parameter estimates are those that minimize the sum of squared differences between the predicted observations and the observed data Prediction based on the MODEL and the PARAMETERS 1

Linear least squares: One parameter Linear least squares: One parameter Recruits (1s) 5 15 1 5 4 6 8 1 Spawners (1s) Assume simple linear stock-recruitment relationship Recruits b x spawners b is recruits per spawner We want to find a b that minimizes the differences between the observed and expected recruits Recruits (1s) 5 15 1 5 4 6 8 1 Spawners (1s) Hilborn and Walters 199 Ch 6 Linear least squares: One parameter Linear least squares: One parameter Recruits b x spawners b is recruits per spawner We want to find a b that minimizes the differences between the observed and expected recruits Recruits (1s) 5 15 1 5 4 6 8 1 Spawners (1s) Rewrite the model in a regression form R t bst Recruits (1s) 5 15 1 5 SS ( R ˆ t Rt ) 4 6 8 1 Spawners (1s) SS ( R ˆ t R ) t 11

Linear least squares: One parameter Linear least squares: One parameter Rewrite the model in a regression form Plug different values of b in until you get the smallest error Sums of Squares 6 5 4 3 1.5 1 1.5.5 3 3.5 4 slope b Let s demo this in Excel to see if it makes more sense Sums of Squares 6 5 4 3 1.5 1 1.5.5 3 3.5 4 slope b R t bst R t bst What if your data looks like this? What if your data looks like this? wt (g) 1 9 8 7 6 5 4 3 1 4 6 8 1 length (mm) wt (g) 1 9 8 7 6 5 4 3 1 y α x b i + ε i 4 6 8 1 length (mm) 1

What if your data looks like this? What if your data looks like this? Transform the data (log 1) The curvalinear relation between weight and length becomes straightened which allows for estimation of a and b by means of linear regression. AFS Fisheries Techniques book (easier approach with a calculator wt (g) 1 9 8 7 6 5 4 3 1 4 6 8 1 length (mm) Transform the data (log 1) Log 1 Wt 4.5 4 3.5 3.5 1.5 1.5 y.9986x - 4.9399 R.987.5 1 1.5.5 3 3.5 Log 1 Length Are these the same? Why don t we work with the raw data? wt (g) 1 9 8 7 6 5 4 3 1 4 6 8 1 length (mm) Log 1 Wt 4.5 4 3.5 3.5 1.5 1.5 y.9986x - 4.9399 R.987.5 1 1.5.5 3 3.5 Log 1 Length Are these the same? Why don t we work with the raw data? Transformed a.11 b.999 Raw a.1 b3. In this case the transformation works pretty good, estimates are about the same between the two methods (should produce an identical fit if the true model is exponential) 13

More complicated equations Fitting these equations We often want to fit more complex nonlinear equations that we can t simply linearize Exponential decay, logistic growth, vonbertalanffy growth curves, Holling functional response curve, KM survival curves are all common nonlinear equations we may want to fit. We may want to fit an equation that is standard in our field (VB growth curves in fisheries) How are these curves fit? Length (cm) 14 1 1 8 6 4 Observed Lengths Predicted Lengths von Bert 4 6 8 1 1 14 16 Age Fitting these equations Fitting these equations Nonlinear model fitting involves some sort of iterative procedure One set of parameter estimates used to find another set of parameter estimates etc. Length (cm) von Bert 14 Observed Lengths 1 Predicted Lengths 1 8 6 4 4 6 8 1 1 14 16 Age Let s watch Solver do its thing Go to Excel VB demo Length (cm) von Bert 14 Observed Lengths 1 Predicted Lengths 1 8 6 4 4 6 8 1 1 14 16 Age 14

Fitting these equations Return papers. What about variance estimates? Output by some stats programs How do we assess uncertainty? Using the confidence intervals and graphically How do we fit more complex models? Similar approaches here (what about uncertainty?) Bootstraps, likelihood, Bayesian credible intervals all to come Take Ben Bolker s class in Zoology! http://www.zoo.ufl.edu/bolker/emdbook/index.html Reviewed by two of us Encourage you to use peer review in the future 15