Correlation. Tests of Relationships: Correlation. Correlation. Correlation. Bivariate linear correlation. Correlation 9/8/2018

Similar documents
Analysing data: regression and correlation S6 and S7

Multiple linear regression S6

Regression coefficients may even have a different sign from the expected.

Regression Diagnostics Procedures

Available online at (Elixir International Journal) Statistics. Elixir Statistics 49 (2012)

L7: Multicollinearity

Bivariate Relationships Between Variables

Homework 6. Wife Husband XY Sum Mean SS

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Slide 7.1. Theme 7. Correlation

Statistics: revision

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

Biol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Chapter 13 Correlation

Sociology 593 Exam 1 Answer Key February 17, 1995

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015

Nonparametric Statistics

Chapter 2 Multiple Regression (Part 4)

Statistics Introductory Correlation

Unit 11: Multiple Linear Regression

Lecture 16: Again on Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Efficient Choice of Biasing Constant. for Ridge Regression

Background to Statistics

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Spearman Rho Correlation

Experimental Design and Data Analysis for Biologists

Contents. Acknowledgments. xix

Can you tell the relationship between students SAT scores and their college grades?

THE EFFECTS OF MULTICOLLINEARITY IN MULTILEVEL MODELS

Statistics in medicine

Correlation and Regression

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Linear correlation. Contents. 1 Linear correlation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Important note: Transcripts are not substitutes for textbook assignments. 1

Multivariate Data Analysis Joseph F. Hair Jr. William C. Black Barry J. Babin Rolph E. Anderson Seventh Edition

Course Review. Kin 304W Week 14: April 9, 2013

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Checking model assumptions with regression diagnostics

Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Measuring relationships among multiple responses

Exam details. Final Review Session. Things to Review

Bivariate statistics: correlation

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Single and multiple linear regression analysis

1 A Review of Correlation and Regression

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Correlation and Regression Bangkok, 14-18, Sept. 2015

Intuitive Biostatistics: Choosing a statistical test

Daniel Boduszek University of Huddersfield

REVIEW 8/2/2017 陈芳华东师大英语系

Unit 14: Nonparametric Statistical Methods

Regression Analysis. BUS 735: Business Decision Making and Research

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

Chapter 8: Correlation & Regression

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Statistics: A review. Why statistics?

About Bivariate Correlations and Linear Regression

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

Correlation: Relationships between Variables

Correlation. Engineering Mathematics III

Textbook Examples of. SPSS Procedure

Multiple Regression and Model Building (cont d) + GIS Lecture 21 3 May 2006 R. Ryznar

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

Correlation. What Is Correlation? Why Correlations Are Used

One-sided and two-sided t-test

3 Variables: Cyberloafing Conscientiousness Age

Chapter 16: Correlation

Multiple Regression and Regression Model Adequacy

psychological statistics

Multicollinearity Exercise

MULTIPLE LINEAR REGRESSION IN MINITAB

3 Non-linearities and Dummy Variables

The simple linear regression model discussed in Chapter 13 was written as

ITEC 621 Predictive Analytics 6. Variable Selection

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

1 Correlation and Inference from Regression

CRP 272 Introduction To Regression Analysis

Correlation and Simple Linear Regression

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

9 Correlation and Regression

STAT Checking Model Assumptions

Practical Biostatistics

Rule of Thumb Think beyond simple ANOVA when a factor is time or dose think ANCOVA.

2 Regression Analysis

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Transcription:

Tests of Relationships: Parametric and non parametric approaches Whether samples from two different variables vary together in a linear fashion Parametric: Pearson product moment correlation Non parametric: Spearman rank order correlation 1 2 We are NOT examining whether one variable depends on the other (regression), but rather whether they vary together Co variation between variables and not the amount of variation explained in a dependent variable by an independent variable Are the dependent (y axis) and independent (x axis) variables interchangeable? Tusk mass and body mass in elephants Mammalian diversity and insect diversity in nature reserves Swimming speed and body length in brine shrimps 3 4 Bivariate linear correlation Do two variables co vary in a linear fashion? Bivariate normal distribution 5 6 1

Look at the data... Making the assumption of linearity As long as there is no obvious curvature to the relationship, OK to proceed... Comparing Pearson s and Spearman s tests Similarities Differences Pearson s test Spearman s test Tests of linear relationship Two samples from each of two variables Data in samples related Parametric Scale data only Nonparametric Scale or ordinal data 8 The Coefficient The statistic used in both a Pearson s and a Spearman s test is a correlation coefficient. Pearson s r Spearman s r s Pearson s test and parametric criteria The two variables come from a bivariate normal distribution For each value of one of the variables, the corresponding values of the other variable should be normally distributed and vice versa. 9 10 Pearson s test and parametric criteria But likely to have only one value of one variable corresponding to a single value for the second variable The Coefficient Significance determined by the strength of r and the sample size... Very low r values and be significant with very large samples sizes (r = 0.08) Biological significance? Providing that you have no reason to think that the data might not conform to these criteria, you can assume that they do 11 12 2

r versus r 2 The correlation coefficient, r, can be squared to give r 2 Whereas r represents the Pearson correlation coefficient, we associate r 2 with the coefficient of determination in regression r versus r 2 Although the square of r is r 2 there interpretation is quite different r represents a co variation between the two variables r 2 is the % of the variation in the dependent variable that is explained by incorporating the independent variable... 13 14 Partial Coefficients Pearson s r can be extended to measure the relationship between two variables when one or more of the variables are controlled Wing size versus wing length while we control for body mass Partial correlation Pearson s : example Suppose a wildlife biologist collects data from the published and unpublished work of other scientists to generate an extensive data set on caribou herds scattered throughout the northern hemisphere Suitable information exists for the survival of collared calves in nine herds during their first summer of life 15 16 Pearson s : example Pearson s : assumptions A reliable estimate of wolf presence is also available for the following winter The presence of other predators (e.g., grizzly bear and lynx) in the vicinity of these herds during this time is estimated to be low and therefore need not be included in the models 17 18 3

Pearson s : assumptions Pearson s : assumptions 19 20 Pearson Test Calculate the test statistic For a Pearson test, the statistic if r, with degrees of freedom = n 2 (n 1 =n 2 ) Pearson Test Using critical value table If r r critical reject H 0 If r < r critical accept H 0 nonsignificant result 21 Pearson Test Pearson Test Using exact P value If P reject H 0 If P > accept H 0 nonsignificant result Report: r = 0.893, df = 7, P = 0.001 recall df = n 2 = 9 2 = 7 24 4

Spearman s : assumptions A Spearman correlation is a non parametric test for assessing if the linear relationship between two samples can be accounted for by sample error alone Need to check scatterplot to make sure that a linear model might be reasonable 25 26 Use when you are looking for a relationship between two sample, one sample from each of two variables You can assume the relationship is linear The data in the samples are ordinal or scale level Calculate the test statistic For a Spearman test, the statistic if r s, with degrees of freedom = n 2 (n 1 =n 2 ) 27 28 Using critical value table If r s r s critical reject H 0 If r s < r s critical accept H 0 non Using exact P value If P reject H 0 If P > accept H 0 nonsignificant result 5

Collinearity in models For models with multiple independent variables they must also be independent of each other Report: r s = 0.424, df = 7, P = 0.256 recall df = n 2 = 9 2 = 7 A number of different ways to assess 31 32 Screening for Collinear Variables But what about collinear combinations of independent variables Tolerance and Variance Inflation Factor In statistics, the variance inflation factor (VIF) quantifies the severity of multicolinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient (the square of the estimate's standard deviation) is increased because of colinearity. 33 34 Tolerance and Variance Inflation Factor Tolerance for the i th independent variable is 1 minus the proportion of variance it shares with the other independent variable in the analysis (1 R 2 i ). This represents the proportion of variance in the i th independent variable that is not related to the other independent variables in the model. The Variance Inflation Factor (VIF) is the reciprocal of tolerance: 1/(1 R 2 i ). Tolerance and Variance Inflation Potential problem with colinearity if Tolerance < 0.10 VIF > 10 But 35 36 6

Cautionary notes These techniques for curing problems associated with multicollinearity can create problems more serious than those they solve. Because of this, we examine these rules of thumb and find that threshold values of the VIF (and tolerance) need to be evaluated in the context of several other factors that influence the variance of regression coefficients. Values of the VIF of 10, 20, 40, or even higher do not, by themselves, discount the results of regression analyses, call for the elimination of one or more independent variables from the analysis, suggest the use of ridge regression, or require combining of independent variable into a single index. O Brien, R.M. 2007. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality and Quantity 41:673 690. 37 7