Chapter 16: Correlation

Similar documents
Chapter 16: Correlation

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Can you tell the relationship between students SAT scores and their college grades?

Correlation: Relationships between Variables

Reminder: Student Instructional Rating Surveys

Correlation and Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Measuring Associations : Pearson s correlation

THE PEARSON CORRELATION COEFFICIENT

Correlation. What Is Correlation? Why Correlations Are Used

Statistics Introductory Correlation

CORELATION - Pearson-r - Spearman-rho

Readings Howitt & Cramer (2014) Overview

Psych 230. Psychological Measurement and Statistics

Readings Howitt & Cramer (2014)

Understand the difference between symmetric and asymmetric measures

Chs. 16 & 17: Correlation & Regression

Chapter 13 Correlation

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Analysing data: regression and correlation S6 and S7

Correlation & Regression Chapter 5

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Unit 6 - Introduction to linear regression

Statistics in medicine

Unit 6 - Simple linear regression

Chs. 15 & 16: Correlation & Regression

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Correlation. January 11, 2018

appstats8.notebook October 11, 2016

Correlation and regression

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

psychological statistics

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Business Statistics. Lecture 10: Correlation and Linear Regression

Contents. Acknowledgments. xix

Correlation & Simple Regression

Relationship Between Interval and/or Ratio Variables: Correlation & Regression. Sorana D. BOLBOACĂ

Notes 6: Correlation

Correlation measures the strength of the relationship between 2 variables.

Textbook Examples of. SPSS Procedure

Bivariate Relationships Between Variables

Stages in scientific investigation: Frequency distributions and graphing data: Levels of measurement:

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Practical Biostatistics

Ch. 16: Correlation and Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

9 Correlation and Regression

Correlation and simple linear regression S5

MATH 1150 Chapter 2 Notation and Terminology

AMS 7 Correlation and Regression Lecture 8

Bivariate statistics: correlation

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

LOOKING FOR RELATIONSHIPS

Correlation and Regression Bangkok, 14-18, Sept. 2015

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

1 A Review of Correlation and Regression

General Linear Model (Chapter 4)

PS2: Two Variable Statistics

How can we explore the association between two quantitative variables?

Intro to Linear Regression

Lecture 8: Summary Measures

Intro to Linear Regression

Chapter 7: Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

CRP 272 Introduction To Regression Analysis

Practice Questions for Exam 1

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

CORRELATION. compiled by Dr Kunal Pathak

PS5: Two Variable Statistics LT3: Linear regression LT4: The test of independence.

Introduction and Single Predictor Regression. Correlation

CORRELATION. suppose you get r 0. Does that mean there is no correlation between the data sets? many aspects of the data may a ect the value of r

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

Vocabulary: Data About Us

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Pre-Calculus Multiple Choice Questions - Chapter S8

Homework 6. Wife Husband XY Sum Mean SS

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Introduction to inferential statistics. Alissa Melinger IGK summer school 2006 Edinburgh

Lecture 5: ANOVA and Correlation

Statistics Handbook. All statistical tables were computed by the author.

Chapter 3. Measuring data

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Correlation. Engineering Mathematics III

CHAPTER 2. Types of Effect size indices: An Overview of the Literature

Chapter 8. Linear Regression /71

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Correlation 1. December 4, HMS, 2017, v1.1

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

HUMBEHV 3HB3 Correlation Week 3

Correlation and regression

Review of Multiple Regression

Chapter 10 Correlation and Regression

Transcription:

Chapter : Correlation So far We ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e. for the population from which the sample came) Which answers the following question: Is there a relationship between x and y? (Yes or No) Where x is a categorical predictor and y is a continuous predictor A new question If there is a relationship between x and y How strong is that relationship? How well can we predict a person s y score if we know x? What is the strength of the relationship or the correlation between x and y

Mean Recall Rote Rehearsal Sentence Interactive Imagery Memory Strategy Mean Recall Rote Rehearsal Sentence Interactive Imagery Memory Strategy Mean Recall Rote Rehearsal Sentence Interactive Imagery Memory Strategy

Mean Recall Rote Rehearsal Sentence Interactive Imagery Memory Strategy Mean Recall Rote Rehearsal Sentence Interactive Imagery Memory Strategy Mean Recall Rote Rehearsal Sentence Interactive Imagery Memory Strategy 3

Mean Recall Rote Rehearsal Sentence Interactive Imagery Memory Strategy However What if x is not a categorical variable What if x is a continuous predictor e.g. arousal level And y is a continuous variable as well e.g. performance level Mean Performance Level Low Medium High Arousal Level

Mean Performance Level Low Medium High Arousal Level Mean Performance Level Low Medium High Arousal Level Mean Performance Level Low Medium High Arousal Level 5

Mean Performance Level Low Medium High Arousal Level Mean Performance Level Arousal Level Mean Performance Level Arousal Level

Mean Performance Level Arousal Level Mean Performance Level Arousal Level 9 7 5 3 5 7 Time to complete exam (in minutes) 7

Person X Y A B 3 C 3 D 5 E F 7 5 5 3 B A C D E F 3 5 7 X values 3 Characteristics of a Correlation: Direction of relationship Form of the relation Degree of the relationship Correlations: Measuring and Describing Relationships (cont.) The direction of the relationship is measured by the sign of the correlation (+ or -). A positive correlation means that the two variables tend to change in the same direction; as one increases, the other also tends to increase. A negative correlation means that the two variables tend to change in opposite directions; as one increases, the other tends to decrease.

5 3 3 5 7 Temperature (in degrees F) 5 3 3 5 7 Temperature (in degrees F) 9

Correlations: Measuring and Describing Relationships (cont.) The most common form of relationship is a straight line or linear relationship which is measured by the Pearson correlation. (a) Performance 3 3 5 7 Amount of practice Vocabulary score (b) Male Female

Correlations: Measuring and Describing Relationships (cont.) The degree of relationship (the strength or consistency of the relationship) is measured by the numerical value of the correlation. A value of. indicates a perfect relationship and a value of zero indicates no relationship. 9 7 5 3 5 7 9 7 5 3 5 7

Y (a) Y (b) X X Y (c) Y (d) X X Where and Why Correlations are Used: Prediction Validity Reliability Theory Verification Correlations: Measuring and Describing Relationships (cont.) To compute a correlation you need two scores, X and Y, for each individual in the sample. The Pearson correlation requires that the scores be numerical values from an interval or ratio scale of measurement. Other correlational methods exist for other scales of measurement.

The Pearson Correlation The Pearson correlation measures the direction and degree of linear (straight line) relationship between two variables. To compute the Pearson correlation, you first measure the variability of X and Y scores separately by computing SS for the scores of each variable (SS X and SS Y ). Then, the covariability (tendency for X and Y to vary together) is measured by the sum of products (SP). The Pearson correlation is found by computing the ratio: SP ( SS x )( SS y ) The Pearson Correlation (cont.) Thus the Pearson correlation is comparing the amount of covariability (variation from the relationship between X and Y) to the amount X and Y vary separately. The magnitude of the Pearson correlation ranges from (indicating no linear relationship between X and Y) to. (indicating a perfect straight-line relationship between X and Y). The correlation can be either positive or negative depending on the direction of the relationship. The Pearson Correlation r = r = degree to which x and y vary together degree to which x and y vary separately covariability of x and y variability of x and y separately r = SP ( SS x )( SS y ) SP = SP = xy ( x x) y y ( ) x n y 3

The Pearson Correlation r = r = degree to which x and y vary together degree to which x and y vary separately covariability of x and y r variability of x and y separately r = SP ( SSx)( SSy) / SP = ( x - x )( y -y) / SP = xy - / x n / y PDF version Computational Examples Computing the SP Scores x y 3 5 7 Deviations ( x- x) ( y- y ) Products ( x- x) y- y ( )

Computing the SP Scores x y Deviations ( x- x) ( y- y ) Products ( x- x) y- y ( ) 3 - - + - + - + - - 5 7 + + + + = SP Computing the SP with the Computational Formula x y xy 5 3 7 3 35 x = y = xy = / SP = xy - / SP = () SP = / x y n SP = + X Y 3 3 5

- a. * for - - //7 it Computing a Pearson Correlation Scores x y 3 5 7. First draw a scatterplot of the x and y data pairs.. Then compute the Pearson r correlation coefficient 3. Compare the scatterplot to the calculated Pearson r Understanding & Interpreting the Pearson Correlation Correlation is not causation Correlation greatly affected by the range of scores represented in the data One or two extreme data points (outliers) can dramatically affect the value of the correlation How accurately one variable predicts the other the strength of a relation

Correlation is not causation Scatterplot of Number of Serious Crimes and Number of Churches Number of Serious Crimes Number of Churches Y Values X When values we restricted look at the to full a range limited of range X, the correlation changes. X Values 7

Restricted Range Problem Y Values X values restricted to a limited range X Values Outliers and Correlations Fig. -9, p. 533

Coefficient of Determination (r ) With r =, None of the Y variability can be predicted from X; r = X Y X Y X Y With r =., the Y variability is partially predicted from the relation with X; r =. or % With r =., the Y variability is completely predicted from the relation with X; r =. or % r versus r r = r = r =. r =. or % r =. r =. or % Different Types of Correlation Pearson Correlation Used only when x and y are both interval or ratio scales If either X or Y are not interval or ratio scales, then we compute a different correlation coefficient. 9

Other Types of Correlation Spearman Correlation Used when either x or y (or both) are ordinal scales Point-Biserial: Used when one variable is interval or ratio and the other variable is dichotomous (two values e.g. male/female) Phi-Coefficient Used when both variables (x and y) are dichotomous. The Spearman Correlation The Spearman correlation is used in two general situations: () When X and Y are both consist of ranks (ordinal scales). () When X and Y are interval/ratio BUT there are outliers in the data--both variables can be converted to ranks and a Spearman correlation is computed. The Spearman Correlation (cont.) The calculation of the Spearman correlation requires:. Two variables are observed for each individual.. The observations for each variable are rank ordered. The X values and Y values are ranked separately. 3. After the variables have been ranked, the Spearman correlation is computed by using the same formula used for the Pearson but substituting the ranked data.

This outlier will change the correlation dramatically. Subject X Y A 3 B 7 C 9 D E 9 So we transform our scores into ranks Original Scores Subject X Y A 3 B 7 C 9 D E 9 Convert Original Scores to Ranks Subject X Rank Y Rank A 3 ( st ) ( nd ) B 7 (3 rd ) ( st ) C ( nd ) 9 (3 rd ) D ( th ) ( th ) E 9 (5 th ) (5 th ) Compute Correlation on the Ranks Subject X Rank Y Rank A B 3 C 3 D E 5 5 Ranking Tied Scores. List the scores from smallest to largest. Include tied values in the list.. Assign a rank ( st, nd, 3 rd, etc.) to each score in the ordered list. 3. When two (or more) scores are tied, compute the mean of their ranks and assign this mean value as the rank for each of the tied scores.

For example Here are some scores with ties: 3, 5, 3,,,,. List the scores from low to high.. Assign a rank to each score 3. For each group of tied scores, compute a mean rank and assign it to each tied score in that group. Scores Rank Final Rank 3 3 5 3 5 7.5.5 3 5 5 5 7 The Point-Biserial Correlation and the Phi Coefficient The same correlation formula can also be used to measure the relationship between two variables when one or both of the variables is dichotomous. A dichotomous variable is one for which there are exactly two categories: for example, men/women or succeed/fail. The Point-Biserial Correlation and the Phi Coefficient (cont.) In situations where one variable is dichotomous and the other consists of regular numerical scores (interval or ratio scale), the resulting correlation is called a point-biserial correlation. When both variables are dichotomous, the resulting correlation is called a phi-coefficient.

Point-Biserial Correlation Used in situations where one variable is measured on an interval or ratio scale but the second variable has only two different values (i.e. a dichotomous variable). Examples:. Vocabulary scores for Males versus Females. IQ scores for college grad versus non college grads 3. Memory scores for visual imagery group versus rote rehearsal group Original Data Converted Data Attitude Score (Y) Gender (X) Attitude Score (Y) Gender (X) 7 9 3 Male Female Male Male Female Male Female Female 7 9 3 The Point-Biserial Correlation (cont.) The point-biserial correlation is closely related to the independent-measures t test introduced in Chapter. When the data consists of one dichotomous variable and one numerical variable, the dichotomous variable can also be used to separate the individuals into two groups. Then, it is possible to compute a sample mean for the numerical scores in each group. 3

The Point-Biserial Correlation (cont.) In this case, the independent-measures t test can be used to evaluate the mean difference between groups. The t-statistic and the point-biserial correlation coefficient are mathematically related. One can be derived from the other. t and r are related Remember: df = n -

Phi Coefficient Original Data Converted Data Birth Order (X) st 3 rd Onl y nd th nd Onl y 3 rd Personality (Y) Introvert Extrovert Extrovert Extrovert Extrovert Introvert Introvert Extrovert Birth Order (X) Personality (Y) 5