Tests of Relationships: Parametric and non parametric approaches Whether samples from two different variables vary together in a linear fashion Parametric: Pearson product moment correlation Non parametric: Spearman rank order correlation 1 2 We are NOT examining whether one variable depends on the other (regression), but rather whether they vary together Co variation between variables and not the amount of variation explained in a dependent variable by an independent variable Are the dependent (y axis) and independent (x axis) variables interchangeable? Tusk mass and body mass in elephants Mammalian diversity and insect diversity in nature reserves Swimming speed and body length in brine shrimps 3 4 Bivariate linear correlation Do two variables co vary in a linear fashion? Bivariate normal distribution 5 6 1
Look at the data... Making the assumption of linearity As long as there is no obvious curvature to the relationship, OK to proceed... Comparing Pearson s and Spearman s tests Similarities Differences Pearson s test Spearman s test Tests of linear relationship Two samples from each of two variables Data in samples related Parametric Scale data only Nonparametric Scale or ordinal data 8 The Coefficient The statistic used in both a Pearson s and a Spearman s test is a correlation coefficient. Pearson s r Spearman s r s Pearson s test and parametric criteria The two variables come from a bivariate normal distribution For each value of one of the variables, the corresponding values of the other variable should be normally distributed and vice versa. 9 10 Pearson s test and parametric criteria But likely to have only one value of one variable corresponding to a single value for the second variable The Coefficient Significance determined by the strength of r and the sample size... Very low r values and be significant with very large samples sizes (r = 0.08) Biological significance? Providing that you have no reason to think that the data might not conform to these criteria, you can assume that they do 11 12 2
r versus r 2 The correlation coefficient, r, can be squared to give r 2 Whereas r represents the Pearson correlation coefficient, we associate r 2 with the coefficient of determination in regression r versus r 2 Although the square of r is r 2 there interpretation is quite different r represents a co variation between the two variables r 2 is the % of the variation in the dependent variable that is explained by incorporating the independent variable... 13 14 Partial Coefficients Pearson s r can be extended to measure the relationship between two variables when one or more of the variables are controlled Wing size versus wing length while we control for body mass Partial correlation Pearson s : example Suppose a wildlife biologist collects data from the published and unpublished work of other scientists to generate an extensive data set on caribou herds scattered throughout the northern hemisphere Suitable information exists for the survival of collared calves in nine herds during their first summer of life 15 16 Pearson s : example Pearson s : assumptions A reliable estimate of wolf presence is also available for the following winter The presence of other predators (e.g., grizzly bear and lynx) in the vicinity of these herds during this time is estimated to be low and therefore need not be included in the models 17 18 3
Pearson s : assumptions Pearson s : assumptions 19 20 Pearson Test Calculate the test statistic For a Pearson test, the statistic if r, with degrees of freedom = n 2 (n 1 =n 2 ) Pearson Test Using critical value table If r r critical reject H 0 If r < r critical accept H 0 nonsignificant result 21 Pearson Test Pearson Test Using exact P value If P reject H 0 If P > accept H 0 nonsignificant result Report: r = 0.893, df = 7, P = 0.001 recall df = n 2 = 9 2 = 7 24 4
Spearman s : assumptions A Spearman correlation is a non parametric test for assessing if the linear relationship between two samples can be accounted for by sample error alone Need to check scatterplot to make sure that a linear model might be reasonable 25 26 Use when you are looking for a relationship between two sample, one sample from each of two variables You can assume the relationship is linear The data in the samples are ordinal or scale level Calculate the test statistic For a Spearman test, the statistic if r s, with degrees of freedom = n 2 (n 1 =n 2 ) 27 28 Using critical value table If r s r s critical reject H 0 If r s < r s critical accept H 0 non Using exact P value If P reject H 0 If P > accept H 0 nonsignificant result 5
Collinearity in models For models with multiple independent variables they must also be independent of each other Report: r s = 0.424, df = 7, P = 0.256 recall df = n 2 = 9 2 = 7 A number of different ways to assess 31 32 Screening for Collinear Variables But what about collinear combinations of independent variables Tolerance and Variance Inflation Factor In statistics, the variance inflation factor (VIF) quantifies the severity of multicolinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient (the square of the estimate's standard deviation) is increased because of colinearity. 33 34 Tolerance and Variance Inflation Factor Tolerance for the i th independent variable is 1 minus the proportion of variance it shares with the other independent variable in the analysis (1 R 2 i ). This represents the proportion of variance in the i th independent variable that is not related to the other independent variables in the model. The Variance Inflation Factor (VIF) is the reciprocal of tolerance: 1/(1 R 2 i ). Tolerance and Variance Inflation Potential problem with colinearity if Tolerance < 0.10 VIF > 10 But 35 36 6
Cautionary notes These techniques for curing problems associated with multicollinearity can create problems more serious than those they solve. Because of this, we examine these rules of thumb and find that threshold values of the VIF (and tolerance) need to be evaluated in the context of several other factors that influence the variance of regression coefficients. Values of the VIF of 10, 20, 40, or even higher do not, by themselves, discount the results of regression analyses, call for the elimination of one or more independent variables from the analysis, suggest the use of ridge regression, or require combining of independent variable into a single index. O Brien, R.M. 2007. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality and Quantity 41:673 690. 37 7