BIOL 4605/7220 CH 20.1 Correlation

Similar documents
7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Unit 14: Nonparametric Statistical Methods

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Chapter 12 - Lecture 2 Inferences about regression coefficient

Psychology 282 Lecture #4 Outline Inferences in SLR

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

Financial Econometrics and Volatility Models Copulas

Statistics Introductory Correlation

Data Analysis as a Decision Making Process

Inter-Rater Agreement

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

Measuring relationships among multiple responses

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Chapter 13 Correlation

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Reminder: Student Instructional Rating Surveys

Analyzing Small Sample Experimental Data

Textbook Examples of. SPSS Procedure

Can you tell the relationship between students SAT scores and their college grades?

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Measuring Associations : Pearson s correlation

Business Statistics. Lecture 10: Correlation and Linear Regression

Correlation: Relationships between Variables

Bivariate Relationships Between Variables

CORELATION - Pearson-r - Spearman-rho

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Non-parametric tests, part A:

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

1. Pearson linear correlation calculating testing multiple correlation. 2. Spearman rank correlation calculating testing 3. Other correlation measures

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Introduction to Statistical Data Analysis III

Correlation & Linear Regression. Slides adopted fromthe Internet

PubH 7405: REGRESSION ANALYSIS. Correlation Analysis

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Understand the difference between symmetric and asymmetric measures

Statistical. Psychology

Rank parameters for Bland Altman plots

Institute of Actuaries of India

NON-PARAMETRIC STATISTICS * (

Statistics: revision

E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

REVIEW 8/2/2017 陈芳华东师大英语系

Types of Statistical Tests DR. MIKE MARRAPODI

My data doesn t look like that..

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Correlation and Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Key Words: absolute value methods, correlation, eigenvalues, estimating functions, median methods, simple linear regression.

Ch. 16: Correlation and Regression

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Ch 2: Simple Linear Regression

Readings Howitt & Cramer (2014) Overview

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Nonparametric Independence Tests

Readings Howitt & Cramer (2014)

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Inferences for Correlation

A Short Course in Basic Statistics

Linear Correlation and Regression Analysis

Nonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health

1 A Review of Correlation and Regression

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Statistics Handbook. All statistical tables were computed by the author.

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

10/31/2012. One-Way ANOVA F-test

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Empirical Power of Four Statistical Tests in One Way Layout

Inferences for Regression

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Chapter 9 - Correlation and Regression

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

DATA IN SERIES AND TIME I. Several different techniques depending on data and what one wants to do

9 Correlation and Regression

Introduction to Linear regression analysis. Part 2. Model comparisons

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Data files & analysis PrsnLee.out Ch9.xls

Review of Statistics 101

Chapter 24. Comparing Means

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

Contents. Acknowledgments. xix

Chapter Eight: Assessment of Relationships 1/42

3. Nonparametric methods

Nonparametric Statistics

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Transcription:

BIOL 4605/70 CH 0. Correlation GPT Lectures Cailin Xu November 9, 0

GLM: correlation Regression ANOVA Only one dependent variable GLM ANCOVA Multivariate analysis Multiple dependent variables (Correlation)

Correlation Two variables associated with each other? No casual ordering (i.e., NEITHER is a function of the other) Total length of aphid stem mothers Mean thorax length of their parthenogenetic offspring Data from Box 5.4 Sokal and Rohlf 0

vs. Correlation

vs. Correlation

Rotate Correlation

Regression vs. Correlation Regression Does depend on X? (describe func. relationship/predict) Usually, X is manipulated & is a random variable Casual ordering =f(x) Correlation Are and related? Both & are random variables No casual ordering

Correlation: parametric vs. non-parametric Parametric measures: Pearson s correlation Nonparametric measures: Spearman s Rho, Kendall s Tau Type of data Measurements (from Normal/Gaussian Population) Ranks, Scores, or Data that do not meet assumptions for sampling distribution (t, F, χ ) Measures of correlation Parametric: Pearson s correlation Nonparametric: Spearman s Rho, Kendall s Tau

Pearson s Correlation Coefficient (ρ) - Strength of relation between two variables & - Geometric interpretation Regression of on ρ = cos(θ ) θ Regression of on Perfect positive association: ϴ=0 ρ= No association: ϴ=90 ρ=0 Perfect negative association: ϴ=80 ρ=- - ρ, true relation

Pearson s Correlation Coefficient (ρ) - Strength of relation between two variables & - Geometric interpretation - Definition ρ, = cov(, σ σ ) = E [( µ ) ( µ )] σ σ Covariance of the two variables divided by the product of their standard deviations

Pearson s Correlation Coefficient (ρ) - Strength of relation between two variables - Geometric interpretation - Definition ( ˆ ρ = r) - Estimate from a sample & Parameter Estimate Name Symbol Mean of µ Mean of Variance of Variance of µ σ σ s s

Pearson s Correlation Coefficient (ρ) - Strength of relation between two variables - Geometric interpretation - Definition - Estimate from a sample ) ˆ ( r = ρ Parameter Estimate & µ µ σ σ s s ( ) ( ) [ ], ), cov( E σ σ µ µ σ σ ρ = = ( )( ) ( )( ) ( ) ( ) = = = i i i i i i i i i i s s n r ˆρ

Pearson s Correlation: Significance Test - Determine whether a sample correlation coefficient could have come from a population with a parametric correlation coefficient of ZERO - Determine whether a sample correlation coefficient could have come from a population with a parametric correlation coefficient of CERTAIN VALUE 0 - Generic recipe for Hypothesis Testing

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic Calculate p-value Declare decision Report statistic with decision

Hypothesis Testing --- Generic Recipe State population All measurements on total length of aphid stem mothers & mean thorax length of their parthenogenetic offspring made by the same experimental protocol ). Randomly sampled ). Same environmental conditions

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) Correlation of the two variables, ρ In the case = = otherwise n df distribution t LARGE n if n r N r,, ), 0, ) ~ ˆ, ρ = n r r t ρ 0 : 0 = ρ H 3 = n z t η = = + = 3 ) var(, ) (, ln n z z E r r z where η z: Normal/tends to normal rapidly as n increases for ρ 0 t-statistic: N(0, ) or t (df = ) + = ln ρ ρ η 0) ( : 0 = ρ ρ ρ H In the case

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) Correlation of the two variables, ρ In the case = =, ), 0, ) ~ ˆ, n df distribution t LARGE n if n r N r ρ = n r r t ρ 0 : 0 = ρ H

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis H 0 : ρ = 0

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis H A : ρ 0

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error α = 5% ( conventional level)

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution t-distribution

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic t-statistic correlation coefficient estimate, r = 0.65 t = (0.65 0)/0.076 = 3.084

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic Calculate p-value t = 3.084, df = 3 p = 0.0044 (one-tail) & 0.0088 (two-tail)

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic Calculate p-value Declare decision p = 0.0088 < α = 0.05 reject accept H 0 H A : ρ 0

Hypothesis Testing --- Generic Recipe State population State model/measure of pattern (statistic) State null hypothesis State alternative hypothesis State tolerance for Type I error State frequency distribution Calculate statistic r = 0.65, n = 5, p = 0.0088 Total length & offspring thorax length are related Report statistic with decision Calculate p-value Declare decision

Pearson s Correlation Assumptions -6-4 - 0 4 6 Assumptions Normal & independent errors Homogeneous around straight line Comp. -0.6-0.4-0. 0.0 0. 0.4 0.6 Lmother 7 5 4 4 0 Lthor 3 3 6 9 8 5-6 -4-0 4 6-0.6-0.4-0. 0.0 0. 0.4 0.6 Comp. What if assumptions for Pearson test not met? Here are the observations relative to the correlation line (comp ) Not homogeneous, due to outliers (observations 8 & 9)

Pearson s Correlation Randomization test Significance test with no distributional assumptions Hold one variable, permute the other one many times A new r from each new permutation Construct empirical frequency distribution Compare the empirical distribution with the observed r

Pearson s Correlation Randomization test 8000 times p = p(r > 0.65) = 0.00875 p = p (r < -0.65) = 0.003875-0.65 0.65 p = p + p = 0.00575 < α = 0.05 Reject Null, accept alternative Consistent with testing result from theoretical t-distribution, for this data

Pearson s Correlation coefficient Confidence Limit 95% confidence limit (tolerance of Type I error @ 5%) t-distribution (df = n ) (NO) a). H0: ρ = 0 was rejected b). Distribution of r is negatively skewed c). Fisher s transformation + r z η z = ln ; ~ N t r n 3 η = + ρ ln ρ ( 0,) or [ ]

Pearson s Correlation coefficient Confidence Limit C. I. for η: zl = z z(α / ) /( n 3), z zu = z + z(α / ) /( n 3) (α / ), critical value from N(0, ) at p = -α/ C. I. for ρ: For our example: rl r u exp(zl ) = tanh( zl ) = exp(zl ) + exp(zu ) = tanh( zu ) = exp(z ) + u 95 percent confidence interval: r l r u = 0.07 = 0.87

Nonparametric: Spearman s Rho Measure of monotone association used when the distribution of the data make Pearson's correlation coefficient undesirable or misleading Spearman s correlation coefficient (Rho) is defined as the Pearson s correlation coefficient between the ranked variables Rho = i ( yi y )( yi y ) i ( ) ( ), y y y y i i 6 i i i If no ties, Rho =, where di = yi y i n ( n d ) where y i, yi are ranks of i, i Randomization test for significance (option)

Nonparametric: Kendall s Tau Concordant pairs ( i, i ) and ( j, j If i > j and i > j or if i < j and i < j ) : (if the ranks for both elements agree) Discordant pairs ( i, i ) and ( j, j If i > j and i < j or if i < j and i > j ) : (if the ranks for both elements disagree) Neither concordant or discordant If i = j or i = j

Nonparametric: Kendall s Tau Kendall s Tau = nc nd n( n ) n c nd nc + nd (no ties) (in the case of ties), where n n d c = number of concordant pairs = number of discordant pairs Properties: Gamma coefficient or Goodman correlation coefficient The denominator is the total number of pairs, - tau tau =, for perfect ranking agreement tau = -, for perfect ranking disagreement tau 0, if two variables are independent For large samples, the sampling distribution of tau is approximately normal

Nonparametric For more information on nonparametric test of correlation e.g., significance test, etc. References: Conover, W.J. (999) Practical nonparametric statistics, 3 rd ed. Wiley & Sons Kendall, M. (948) Rank Correlation Methods, Charles Griffin & Company Limited Caruso, J. C. & N. Cliff. (997) "Empirical Size, Coverage, and Power of Confidence Intervals for Spearman's Rho", Ed. and Psy. Meas., 57 pp. 637 654 Corder, G.W. & D.I. Foreman. (009) "Nonparametric Statistics for Non- Statisticians: A Step-by-Step Approach", Wiley

Data Total length of aphid stem mothers () Vs. Mean thorax length of their parthenogenetic offspring () # 8.7 5.95 8.5 5.65 3 9.4 6.00 4 0.0 5.70 5 6.3 4.70 6 7.8 5.53 7.9 6.40 8 6.5 4.8 9 6.6 6.5 0 0.6 5.93 0. 5.70 7. 5.68 3 8.6 6.3 4. 6.30 5.6 6.03

Total length of mothers Vs. Mean thorax length of offspring # RAW RANK y y 8.7 5.95 8 9 8.5 5.65 6 4 3 9.4 6.00 9 0 4 0.0 5.70 0 6.5 5 6.3 4.70 6 7.8 5.53 5 3 7.9 6.40 5 5 8 6.5 4.8 9 6.6 6.5 3 3 0 0.6 5.93 8 0. 5.70 6.5 7. 5.68 4 5 3 8.6 6.3 7 4. 6.30 3 4 5.6 6.03 4

Group Activity

Activity Instructions Question: REGRESSION or CORRELATION? Justification guideline: Regression: X y Correlation: X,... Xn unknown

Activity Instructions Form small groups or -3 people. Each group is assigned a number Group members work together on each example for 5 minutes, come up with an answer & your justifications A number will be randomly generated from the group # s The corresponding group will have to present their answer & justifications Go for the next example...

Activity Instructions There is NO RIGHT/WRONG ANSWER (for these examples), as long as your justifications are LOGICAL

Example Height and ratings of physical attractiveness vary across individuals. Would you analyze this as regression or correlation? Subject Height Phy 69 7 6 8 3 68 6 4 66 5 5 66 8.... 48 7 0

Example Airborne particles such as dust and smoke are an important part of air pollution. Measurements of airborne particles made every six days in the center of a small city and at a rural location 0 miles southwest of the city (Moore & McCabe, 999. Introduction to the Practice of Statistics). Would you analyze this relation as regression or correlation?

Example 3 A study conducted in the Egyptian village of Kalama examined the relation between birth weights of 40 infants and family monthly income (El-Kholy et al. 986, Journal of the Egyptian Public Health Association, 6: 349). Would you analyze this relation as regression or correlation?