Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Similar documents
Can you tell the relationship between students SAT scores and their college grades?

THE PEARSON CORRELATION COEFFICIENT

Chapter 16: Correlation

Correlation and Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Chapter 16: Correlation

Business Statistics. Lecture 10: Correlation and Linear Regression

Correlation: Relationships between Variables

Correlation Analysis

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Reminder: Student Instructional Rating Surveys

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Statistics Introductory Correlation

Intro to Linear Regression

Ch. 16: Correlation and Regression

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Correlation and the Analysis of Variance Approach to Simple Linear Regression

AMS 7 Correlation and Regression Lecture 8

Chs. 16 & 17: Correlation & Regression

Inferences for Regression

Correlation and regression

Review of Statistics 101

Relationship Between Interval and/or Ratio Variables: Correlation & Regression. Sorana D. BOLBOACĂ

Correlation. What Is Correlation? Why Correlations Are Used

Chs. 15 & 16: Correlation & Regression

CORELATION - Pearson-r - Spearman-rho

Intro to Linear Regression

Review of Multiple Regression

Bivariate Relationships Between Variables

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

9. Linear Regression and Correlation

Inferences for Correlation

Statistics in medicine

Chapter 4 Describing the Relation between Two Variables

Inference for Regression Simple Linear Regression

Measuring Associations : Pearson s correlation

Chapter 13 Correlation

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Business Statistics. Lecture 9: Simple Regression

11 Correlation and Regression

Finding Relationships Among Variables

1 A Review of Correlation and Regression

Correlation and simple linear regression S5

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Introduction and Single Predictor Regression. Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Analysing data: regression and correlation S6 and S7

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Scatter plot of data from the study. Linear Regression

Linear Correlation and Regression Analysis

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Homework 2: Simple Linear Regression

Chapter 16. Simple Linear Regression and dcorrelation

Inference for the Regression Coefficient

Chapter 4. Regression Models. Learning Objectives

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Notes 6: Correlation

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Scatter plot of data from the study. Linear Regression

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Readings Howitt & Cramer (2014) Overview

Chapter 4: Regression Models

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

About Bivariate Correlations and Linear Regression

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Chapter 27 Summary Inferences for Regression

N Utilization of Nursing Research in Advanced Practice, Summer 2008

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Chapter 12 : Linear Correlation and Linear Regression

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

The Simple Linear Regression Model

Chapter 9 - Correlation and Regression

Chapter 9. Correlation and Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Chapter 16. Simple Linear Regression and Correlation

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Mathematical Notation Math Introduction to Applied Statistics

determine whether or not this relationship is.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Econometrics. 4) Statistical inference

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Practical Biostatistics

Confidence Intervals, Testing and ANOVA Summary

Midterm 2 - Solutions

df=degrees of freedom = n - 1

y n 1 ( x i x )( y y i n 1 i y 2

Correlation 1. December 4, HMS, 2017, v1.1

Business Statistics. Lecture 10: Course Review

Correlation and Simple Linear Regression

Readings Howitt & Cramer (2014)

LOOKING FOR RELATIONSHIPS

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Chapter 10. Simple Linear Regression and Correlation

4/22/2010. Test 3 Review ANOVA

Review of Statistics

Inference for Regression Inference about the Regression Model and Using the Regression Line

Correlation and regression

Transcription:

Correlation

Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency

Direction of the Relationship Positive correlation Variables moving in the same direction Negative correlation Variables moving in opposite directions

Form of the Relationship Linear or non-linear Predicting data

Strength/Consistency How well do data fit the specific form? Measured by the distance between actual data and the predicted data The absolute value of a correlation Measuring the fitness 1: Perfect fit 0: Not fit at all

Correlation Measures The Pearson correlation Linear relationship The sign of the correlation: direction The numerical value: the degree of the relationship The Spearman correlation For ordinal scale of measurement Both the X values and the Y values are ranks. Measuring consistency for data relationship Not necessarily to be linear The point-biserial correlation Used to measure the correlation between a regular variable and a dichotomous variable

The Pearson Correlation r degreeto which X and Y vary together degreeto which X and Y vary separately covariability of variability of SP SS X SS Y X X and Y and Y separately

SP SS ( X M X )( Y MY ) ( X M ) 2 SS ( X ) 2 X n SP XY 2 ( X )( Y) n

Check the Result Using the scatterplot of data Drawing the envelope around all data points Checking the direction and shape of the envelope 5 X Y 4 3 2 1 0 1 10 3 4 1 8 2 8 3 0 0 2 4 6 8 10 12

Interpreting Correlations Predication Correlation is just about relationship between two variables. Not necessarily causation!!

Interpreting Correlations Predication Correlation is just about relationship between two variables. Not necessarily causation!! The value could be affected greatly by the data range.

Data Range and Correlation

Interpreting Correlations Predication Correlation is just about relationship between two variables. Not necessarily causation!! The value could be affected greatly by the data range. Outliers can dramatically affect the value.

Outlier and Correlation

The Strength of Relationship

The Strength of Relationship The coefficient of determination Squaring the value of correlation How much of the variance in dependent variable is accounted for by independent variable. Similar to the power used in z- and t-tests

Hypothesis Tests with the Pearson Correlation Pearson correlation is usually computed for sample data, but used to test hypotheses about the relationship in the population Population correlation shown by Greek letter rho (ρ) Non-directional: H 0 : ρ = 0 and H 1 : ρ 0 Directional: H 0 : ρ 0 and H 1 : ρ > 0 or Directional: H 0 : ρ 0 and H 1 : ρ < 0

Population vs. Sample

Correlation Hypothesis Test Sample correlation r used to test population ρ Hypothesis test can be computed using either t or F Use t table to find critical value t r s r 1 r df 2 s r

About df What should the df be? Suppose the sample size is n t r 2 (1 r ) n 2

Example α =.05 n = 30 r = 0.35 t r 2 (1 r ) n 2 0.35 (1 0.35 28 2 ) 1.97 Two-tailed test: critical value ±2.048 Fail to reject the null hypothesis One-tailed test: reject: 1.701 Reject

Using r Directly

Report Correlations A correlation for the data revealed a significant relationship between amount of education and annual income, r (28)= 0.65, p <.01, two-tailed.

Usually, Multiple Variables Involved in Correlation Tests

Partial Correlation Involvement of other factors in correlation?

Partial Correlation

Partial Correlation A partial correlation measures the relationship between two variables while mathematically controlling the influence of a third variable by holding it constant r xy z r xy (1 r 2 xz ( r xz r yz )(1 ) r 2 yz )

Example Number of Churches (X) Number of Crimes (Y) 1 4 1 2 3 1 3 1 1 4 2 1 5 5 1 7 8 2 8 11 2 9 9 2 10 7 2 11 10 2 13 15 3 14 14 3 15 16 3 16 17 3 17 13 3 Population (Z) r xy z 0

What if the relationship looks like this?

The Spearman Correlation To measure the degree of consistency of direction Not necessarily linear. One extra step before calculating the Pearson correlation Ranking the X and Y values Analyze the correlation of ranking values. X Y (values) X Y (Ranks) 1 3 2 2 6 4 4 3 2 5 3 4 0 2 1 1

Ranking Tied Scores Using the same rank for same scores Ranking all scores Computing the mean for ranked position of same scores X Y (values) X Y (Ranks) 1 3 2 2 (2.5) 6 3 4 3 (2.5) 2 5 3 4 0 2 1 1

Special Formula for Spearman Correlation SS n( n 2 1) 12 r s 6 1 2 n( n D 2 1)

The Point-Biserial Correlation Just like the Pearson correlation One variable has only two values Gender, success/failure, college education or not, The value of correlation has nothing to do with the values you used in study (1/0, 1/-1, etc.)

Point-Biserial Correlation vs. t Test t test t = 4 p <.001 df = 18 Point-Biserail r = 0.686

If we know two variables are linearly related, how can we describe such a relationship? Using a linear equation y = bx + a

Regression

Goal of Regression Determining two constants for a linear equation: y=bx+a b: slope a: intercept Methods The least-squares solution

Distance = Y - Y^ ^ Minimizing S(Y-Y) 2

Formula b SP SS X a M Y bm X

Regression in Excel Draw a scatterplot Show the trendline

Linear Equations and Regression The Pearson correlation measures a linear relationship between two variables This figure Makes the relationship easier to see Shows the central tendency of the relationship Can be used for prediction

Linear Equations General equation for a line Equation: Y = bx + a X and Y are variables a and b are fixed constant

Regression Regression is a method of finding an equation describing the best-fitting line for a set of data Least square Minimizing errors of known data Or the error of prediction

Error of Prediction With a linear function from regression, we can calculate the predicted value based on a given X Ŷ Error of prediction: Y- Ŷ Often squared

Standard Error of Estimate Regression equation makes a prediction Precision of the estimate is measured by the standard error of estimate (SEoE) SEoE = SS residual df ( Y Yˆ) n 2 2 SS residual ( ˆ) Y Y n 2 2

Relationship Between Correlation and Standard Error of Estimate SS regression = r 2 SS Y SS residual = (1 - r 2 ) SS Y SS df residual 2 (1 r ) SS n 2 Y

Testing Regression Significance Analysis of Regression Similar to Analysis of Variance Uses an F-ratio of two Mean Square values Each MS is an SS divided by its df H 0 : the slope of the regression line (b or beta) is zero no regression

Mean Squares and F-ratio MS regression SS df regression regression MS residual SS df residual residual F MS MS regression residual

SS and df in Regression Analysis

SPSS Output Example

In Excel X Y 5 10 1 4 4 5 7 11 6 15 4 6 3 5 2 0

ANOVA and Regression Basically the same method, but different perspectives to look at the results Main effect in ANOVA == a variable in regression Interaction between two factors == multiplication of two variables in regression Regression not only tells difference, but also predicts by how much. Multivariate regression

Linear or Non-Linear Regression? Linear models are usually good enough to most research in IST. If non-linear models are involved, how to tell the linear model you have is not appropriate? Look at residual distribution

In Summary Correlation: the relationship between two variables Direction, form, degree Three methods For different purposes Regression Determining the linear equation that data best fit Slope and intercept

Homework Three problems to solve.