Correlation: Relationships between Variables

Similar documents
Reminder: Student Instructional Rating Surveys

Correlation and Linear Regression

Can you tell the relationship between students SAT scores and their college grades?

Chapter 16: Correlation

Chapter 16: Correlation

REVIEW 8/2/2017 陈芳华东师大英语系

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Measuring Associations : Pearson s correlation

CORELATION - Pearson-r - Spearman-rho

Readings Howitt & Cramer (2014) Overview

Readings Howitt & Cramer (2014)

Intro to Linear Regression

Intro to Linear Regression

Statistics Introductory Correlation

Ch. 16: Correlation and Regression

Chs. 15 & 16: Correlation & Regression

Correlation. What Is Correlation? Why Correlations Are Used

Notes 6: Correlation

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Chs. 16 & 17: Correlation & Regression

11 Correlation and Regression

CORRELATION. compiled by Dr Kunal Pathak

Business Statistics. Lecture 10: Correlation and Linear Regression

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Contents. Acknowledgments. xix

CORRELATION. suppose you get r 0. Does that mean there is no correlation between the data sets? many aspects of the data may a ect the value of r

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Hypothesis Testing hypothesis testing approach

Bivariate Relationships Between Variables

Relationship Between Interval and/or Ratio Variables: Correlation & Regression. Sorana D. BOLBOACĂ

Upon completion of this chapter, you should be able to:

Correlation and regression

Correlation and Regression

Correlation and simple linear regression S5

Chapter 13 Correlation

N Utilization of Nursing Research in Advanced Practice, Summer 2008

Chapter 10. Correlation and Regression. Lecture 1 Sections:

9 Correlation and Regression

Statistics in medicine

Homework 6. Wife Husband XY Sum Mean SS

CRP 272 Introduction To Regression Analysis

Statistics Handbook. All statistical tables were computed by the author.

Statistics 135 Fall 2008 Final Exam

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Unit 2. Describing Data: Numerical

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Correlation and Regression Bangkok, 14-18, Sept. 2015

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Chapter 12 - Part I: Correlation Analysis

Unit 6 - Introduction to linear regression

This document contains 3 sets of practice problems.

Basic Statistical Analysis

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Data Analysis as a Decision Making Process

14: Correlation. Introduction Scatter Plot The Correlational Coefficient Hypothesis Test Assumptions An Additional Example

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

15.0 Linear Regression

Pearson s Product Moment Correlation: Sample Analysis. Jennifer Chee. University of Hawaii at Mānoa School of Nursing

Lecture 18: Analysis of variance: ANOVA

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Research Methodology Statistics Comprehensive Exam Study Guide

Correlation. Engineering Mathematics III

Lab #12: Exam 3 Review Key

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Statistics: revision

Relationships between variables. Association Examples: Smoking is associated with heart disease. Weight is associated with height.

Sampling Distributions: Central Limit Theorem

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Multiple Linear Regression

Simple Linear Regression: One Quantitative IV

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

STAT 350 Final (new Material) Review Problems Key Spring 2016

Final Exam - Solutions

About Bivariate Correlations and Linear Regression

Correlation and Regression. Tudor Călinici 2017

CORRELATION ANALYSIS. Dr. Anulawathie Menike Dept. of Economics

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Chapter 8: Correlation & Regression

Correlation and Regression

Linear Correlation and Regression Analysis

Scatter plot of data from the study. Linear Regression

1 A Review of Correlation and Regression

One-way ANOVA. Experimental Design. One-way ANOVA

Transcription:

Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are often interested in graded relationships between variables, such as how well one variable can predict another Examples: How well do SAT scores predict a student s GPA? How is the amount of time a student takes to complete an exam related to her grade on that exam? How well do IQ scores correlate with income? How does a child s height correlate with his running speed? How does class size affect student performance?

Correlation: Relationships between Variables Correlation is a statistical technique used to describe the relationship between two variables. Usually the two variables are simply observed as they exist in the environment (with no experimental manipulation a correlational study) However, results from experimental studies (in which one of the variables is systematically manipulated) can also be analyzed using correlation

Mean Comparison Approach Height Weight 70 150 67 140 72 180 75 190 68 145 69 150 71.5 164 71 140 72 142 69 136 67 123 68 155 66 140 72 145 73.5 160 73 190 69 155 73 165 72 150 Weights Short Tall 140 164 140 180 123 142 145 145 155 150 150 190 136 165 155 160 150 190 140

Correlation: Scatter Plots Height Weight 70 150 67 140 72 180 75 190 68 145 69 150 71.5 164 71 140 72 142 69 136 67 123 68 155 66 140 72 145 73.5 160 73 190 69 155 73 165 72 150

Scatter Plots Height Weight 70 150 67 140 72 180 75 190 68 145 69 150 71.5 164 71 140 72 142 69 136 67 123 68 155 66 140 72 145 73.5 160 73 190 69 155 73 165 72 150

Scatter Plots Height Weight 70 150 67 140 72 180 75 190 68 145 69 150 71.5 164 71 140 72 142 69 136 67 123 68 155 66 140 72 145 73.5 160 73 190 69 155 73 165 72 150

Scatter Plots Height Weight 70 150 67 140 72 180 75 190 68 145 69 150 71.5 164 71 140 72 142 69 136 67 123 68 155 66 140 72 145 73.5 160 73 190 69 155 73 165 72 150

Scatter Plots Height Weight 70 150 67 140 72 180 75 190 68 145 69 150 71.5 164 71 140 72 142 69 136 67 123 68 155 66 140 72 145 73.5 160 73 190 69 155 73 165 72 150

Characteristics of the Correlation A Correlation coefficient is a single number describing the relationship between two variables. This number describes: The direction of the relationship Variables sharing a positive correlation tend to change in the same direction (e.g., height and weight). As the value of one of the variables (height) increases, the value of other variable (weight) also increases Variables sharing a negative correlation tend to change in opposite directions (e.g., snowfall and beach visitors). As the value of one of the variables (amount of snowfall) increases, the value of the other variable (number of beach visitors) decreases. The strength of the relationship Variables that share a strong correlation (close to +1 or -1) strongly predict one another, while variables that share a weak correlation (near 0) do not.

Positive versus Negative Correlations Positive Correlation Negative Correlation

Strong versus Weak Correlations

Correlation is not Causation

Possible Sources of Correlation The relationship is causal. Manipulating the predictor variable causes an increase or decrease in the criterion variable. E.g., leg strength and sprinting speed The causal relationship is backwards (reverse causality). Manipulating the criterion variable causes changes in the predictor variable The two variables work together systematically to cause an effect The relationship may be due to one or more confounding variables Changes in both variables reflect the effect of a confounding variable E.g., intelligence as an explanation for correlated performance on different exams E.g., increasing density in cities increases the number of physicians and the number of crimes

Measuring Correlation: Pearson s r To compute a correlation you need a pair of scores, X and Y, for each individual in the sample. The most commonly used measure of correlation is Pearson s product-moment correlation coefficient, or more simply, Pearson s r. Conceptually, Pearson s r is a ratio between the degree to which two variables (X and Y) vary together and the degree to which they vary separately. r co-variability( XY, ) variability( X) variability( Y)

The Covariance The term in the numerator of Pearson s r is the covariance, an unnormalized statistic representing the degree to which two variables (X and Y) vary together. cov XY X M Y M X n 1 Y Mathematically, it is the average of the product of the deviations of two paired variables The covariance depends both on how consistently X and Y tend to vary together and on the individual variability of the variables (X and Y).

The Covariance Notice that the formula for covariance looks a lot like the formula for variance: s 2 X 2 X M X M X M n1 n1 X X X cov XY X M Y M X n 1 Y

The Covariance Moreover, they share a similar computational formula: s SS ; where SS n 1 X 2 X 2 X X 2 X X X XX n n cov XY SPXY ; where n 1 SP XY XY XY n

Computing Pearson s r Pearson s r is computed by dividing by the product of the standard deviations of each of the variables This removes the effect of the variability of the individual variables r cov XY s s SP XY SS SS X Y X Y

Computing Pearson s r: Example X Y 0 2 10 6 4 2 8 4 8 6

Computing Pearson s r: Example X Y XY 0 2 0 10 6 60 4 2 8 8 4 32 8 6 48 Compute SS X, SS Y, & SP XY : Compute r: SS SS SP X Y XY X 2 2 2 30 X 244 244 180 64 N 5 Y 2 2 2 20 Y 96 96 80 16 N 5 X Y XY 148 120 28 N r SPXY 28 28 0.875 SS SS 6416 32 X Y

Computing Pearson s r: Example Hypothesis testing for r: The null hypothesis is that the population correlation coefficient ρ = 0 The alternative hypothesis is that ρ 0 tcrit ( df ) rcrit ( df ) ; df N 2 df t 2 crit

t-distribution Table α t One-tailed test α/2 α/2 -t t Two-tailed test Level of significance for one-tailed test 0.25 0.2 0.15 0.1 0.05 0.025 0.01 0.005 0.0005 Level of significance for two-tailed test df 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001 1 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657 636.619 2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.599 3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 12.924 4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610 5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.869 6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.959 7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 5.408 8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 5.041 9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.781 10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.587 11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.437 12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 4.318 13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 4.221 14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 4.140 15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 4.073 16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 4.015 17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.965 18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.922 19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.883 20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.850 21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.819 22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.792 23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.768 24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.745 25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.725 26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.707 27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.690 28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.674 29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.659 30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.646 40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 3.551 50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 3.496 100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 3.390

Computing Pearson s r: Example tcrit ( df ) rcrit ( df ) ; df N 2 5 3 3 df t ( df ) 2 crit tcrit (3) 3.182 r crit tcrit 3.182 3.182 0.878 df t 33. 182 3.623 2 2 crit

Critical values for Pearson s r Level of Significance for One-Tailed Test 0.05 0.025 0.01 0.005 0.0005 Level of Significance for Two-Tailed Test df = n-2 0.1 0.05 0.02 0.01 0.001 1 0.988 0.997 1.000 1.000 1.000 2 0.900 0.950 0.980 0.990 0.999 3 0.805 0.878 0.934 0.959 0.991 4 0.729 0.811 0.882 0.917 0.974 5 0.669 0.754 0.833 0.875 0.951 6 0.621 0.707 0.789 0.834 0.925 7 0.582 0.666 0.750 0.798 0.898 8 0.549 0.632 0.715 0.765 0.872 9 0.521 0.602 0.685 0.735 0.847 10 0.497 0.576 0.658 0.708 0.823 11 0.476 0.553 0.634 0.684 0.801 12 0.458 0.532 0.612 0.661 0.780 13 0.441 0.514 0.592 0.641 0.760 14 0.426 0.497 0.574 0.623 0.742 15 0.412 0.482 0.558 0.606 0.725 16 0.400 0.468 0.543 0.590 0.708 17 0.389 0.456 0.529 0.575 0.693 18 0.378 0.444 0.516 0.561 0.679 19 0.369 0.433 0.503 0.549 0.665 20 0.360 0.423 0.492 0.537 0.652 21 0.352 0.413 0.482 0.526 0.640 22 0.344 0.404 0.472 0.515 0.629 23 0.337 0.396 0.462 0.505 0.618 24 0.330 0.388 0.453 0.496 0.607 25 0.323 0.381 0.445 0.487 0.597 26 0.317 0.374 0.437 0.479 0.588 27 0.311 0.367 0.430 0.471 0.579 28 0.306 0.361 0.423 0.463 0.570 29 0.301 0.355 0.416 0.456 0.562 30 0.296 0.349 0.409 0.449 0.554 40 0.257 0.304 0.358 0.393 0.490 50 0.231 0.273 0.322 0.354 0.443 100 0.164 0.195 0.230 0.254 0.321

Computing Pearson s r: Example tcrit ( df ) rcrit ( df ) ; df N 2 53 3 df t 2 crit tcrit (3) 3.182 r crit tcrit 3.182 3.182 0.878 df t 33. 182 3.623 2 2 crit 0.875 0.878; accept H, the correlation is not significant. 0

Linear Correlation: Assumptions 1. Linearity Assumes that the relationship between the paired scores is best described by a straight line 2. Normality Assumes that the marginal score distributions, their joint distribution, and any conditional distributions are normally distributed 3. Homoscedasticity Assumes that the variability around the regression line is homogeneous across different score values

Other Correlation Coefficients Spearman s correlation coefficient (r s ) for ranked data As the name suggests, Spearman s correlation is used when the scores for both X and Y consist of (or have been converted to) ordinal ranks The point biserial correlation coefficient (r pb ) This correlation is used when one of the scores is continuous and the other is dichotomous, taking on one of only two possible values The phi correlation coefficient (r ϕ ) The phi correlation is used when both scores are dichotomous All of the above can be computed in the same manner as Pearson s correlation.

Converting Data for Spearman s Correlation Correlation Original Data Age Height 10 31.4 11 41 12 47.8 13 52.8 14 55.7 15 58.3 16 60.7 17 62.1 18 62.7 19 63.3 20 64.1 21 64.3 22 64.6 23 64.7 24 64.5 25 64.3 r = 0.86

Converting Data for Spearman s Correlation Correlation Original Data Converted Scores Age Height Age Rank Height rank 10 31.4 1 1 11 41 2 2 12 47.8 3 3 13 52.8 4 4 14 55.7 5 5 15 58.3 6 6 16 60.7 7 7 17 62.1 8 8 18 62.7 9 9 19 63.3 10 10 20 64.1 11 11 21 64.3 12 12.5 22 64.6 13 15 23 64.7 14 16 24 64.5 15 14 25 64.3 16 12.5 r = 0.86 r = 0.97

Converting Data for the Point Biserial Correlation

Converting Data for Phi Correlation