Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

Similar documents
Can you tell the relationship between students SAT scores and their college grades?

REVIEW 8/2/2017 陈芳华东师大英语系

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Chs. 16 & 17: Correlation & Regression

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Chs. 15 & 16: Correlation & Regression

Chapter 9 - Correlation and Regression

Measuring Associations : Pearson s correlation

About Bivariate Correlations and Linear Regression

Correlation and simple linear regression S5

Practical Biostatistics

Correlation: Relationships between Variables

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Reminder: Student Instructional Rating Surveys

Upon completion of this chapter, you should be able to:

Simple Linear Regression

Ch. 16: Correlation and Regression

Chapter 11. Correlation and Regression

AP Statistics Two-Variable Data Analysis

Correlation and Linear Regression

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

THE PEARSON CORRELATION COEFFICIENT

Slide 7.1. Theme 7. Correlation

Analysing data: regression and correlation S6 and S7

SPSS Guide For MMI 409

Relationship Between Interval and/or Ratio Variables: Correlation & Regression. Sorana D. BOLBOACĂ

8/28/2017. Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables (X and Y)

Correlation and Regression

Linear Correlation and Regression Analysis

Contents. Acknowledgments. xix

11 Correlation and Regression

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

1 A Review of Correlation and Regression

CORELATION - Pearson-r - Spearman-rho

Readings Howitt & Cramer (2014)

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Chapter 12 : Linear Correlation and Linear Regression

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Chapter 3: Examining Relationships

Chapter 10-Regression

Multiple linear regression S6

Chapter 16: Correlation

Readings Howitt & Cramer (2014) Overview

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

IT 403 Practice Problems (2-2) Answers

STAT 4385 Topic 03: Simple Linear Regression

Correlation and Regression Bangkok, 14-18, Sept. 2015

1 Correlation and Inference from Regression

Module 8: Linear Regression. The Applied Research Center

9. Linear Regression and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Correlation & Simple Regression

Statistics Introductory Correlation

Review of Multiple Regression

Correlation and Simple Linear Regression

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Correlation. Engineering Mathematics III

BIOSTATISTICS NURS 3324

Analysis of Bivariate Data

Unit 6 - Introduction to linear regression

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

6. CORRELATION SCATTER PLOTS. PEARSON S CORRELATION COEFFICIENT: Definition

Business Statistics. Lecture 10: Correlation and Linear Regression

Bivariate Relationships Between Variables

Chapter 9. Correlation and Regression

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Chapter 4 Describing the Relation between Two Variables

Mrs. Poyner/Mr. Page Chapter 3 page 1

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

15.0 Linear Regression

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Multiple Linear Regression I. Lecture 7. Correlation (Review) Overview

Relationships Regression

Stat 101: Lecture 6. Summer 2006

Example: Can an increase in non-exercise activity (e.g. fidgeting) help people gain less weight?

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Statistics in medicine

CORRELATION. compiled by Dr Kunal Pathak

Regression. Marc H. Mehlman University of New Haven

Stat 101 Exam 1 Important Formulas and Concepts 1

Prob/Stats Questions? /32

Sociology 593 Exam 2 March 28, 2002

Multivariate Correlational Analysis: An Introduction

LOOKING FOR RELATIONSHIPS

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

SCATTERPLOTS. We can talk about the correlation or relationship or association between two variables and mean the same thing.

Correlation Analysis

12.7. Scattergrams and Correlation

Topic 10 - Linear Regression

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Transcription:

Correlation (Pearson & Spearman) & Linear Regression Azmi Mohd Tamil Key Concepts Correlation as a statistic Positive and Negative Bivariate Correlation Range Effects Outliers Regression & Prediction Directionality Problem ( & cross-lagged panel) Third Variable Problem (& partial correlation) Assumptions Example of Non-Linear Relationship erkes-dodson Law not for correlation Related pairs Scale of measurement. For Pearson, data should be interval or ratio in nature. Normality Linearity Homocedasticity Better Performance Worse Low Stress High Correlation Stress Illness Correlation parametric & non-para 2 Continuous Variables - Pearson linear relationship e.g., association between height and weight 1 Continuous, 1 Categorical Variable (Ordinal) Spearman/Kendall e.g., association between Likert Scale on work satisfaction and work output pain intensity (no, mild, moderate, severe) and dosage of pethidine

Pearson Correlation 2 Continuous Variables linear relationship e.g., association between height and weight, + measures the degree of linear association between two interval scaled variables analysis of the relationship between two quantitative outcomes, e.g., height and weight, How to calculate r? Example x = 4631 x2 = 688837 y = 2863 y2 = 264527 xy = 424780 n = 32 a=424780-(4631*2863/32) =10450.22 b=688837-46312/32=18644.47 c=264527-28632/32=8377.969 r=a/(b*c) 0.5 =10450.22/(18644.47*8377.969) 0.5 =0.836144 t= 0.836144*((32-2)/(1-0.8361442)) 0.5 t = 8.349436 & d.k. = 30, p < 0.001 Interpreting Correlations Correlation Statistics Problems with causal interpretation Two pieces of information: The strength of the relationship The direction of the relationship

Strength of relationship How to interpret the value of r? r lies between -1 and 1. Values near 0 means no (linear) correlation and values near ± 1 means very strong correlation. 1.0 0.0 +1.0 Strong Negative No Rel. Strong Positive Correlation ( + direction) Correlation ( - direction) Positive correlation: high values of one variable associated with high values of the other Example: Higher course entrance exam scores are associated with better course grades during the final exam. Positive and Linear Negative correlation: The negative sign means that the two variables are inversely related, that is, as one variable increases the other variable decreases. Example: Increase in body mass index is associated with reduced effort tolerance. Negative and Linear Pearson s r Coefficient of Determination Defined A.9 is a very strong positive association (as one variable rises, so does the other) A -.9 is a very strong negative association (as one variable rises, the other falls) r=.9 has nothing to do with 90% r=correlation coefficient Pearson s r can be squared, r 2, to derive a coefficient of determination. Coefficient of determination the portion of variability in one of the variables that can be accounted for by variability in the second variable

Coefficient of Determination Coefficient of Determination and Pearson s r Pearson s r can be squared, r 2, to derive a coefficient of determination. Example of depression and CGPA Pearson s r shows negative correlation, r=-.5 r 2 =.25 In this example we can say that 1/4 or.25 of the variability in CGPA scores can be accounted for by depression (remaining 75% of variability is other factors, habits, ability, motivation, courses studied, etc) Pearson s r can be squared, r 2 If r=.5, then r 2 =.25 If r=.7 then r 2 =.49 Thus while r=.5 versus.7 might not look so different in terms of strength, r 2 tells us that r=.7 accounts for about twice the variability relative to r=.5 A study was done to find the association between the mothers weight and their babies birth weight. The following is the scatter diagram showing the relationship between the two variables. Causal Silence: Correlation Does Not Imply Causality Baby's Birthweight 5.00 4.00 3.00 2.00 1.00 R Sq Linear = 0.204 The coefficient of correlation (r) is 0.452 The coefficient of determination (r 2 ) is 0.204 Twenty percent of the variability of the babies birth weight is determined by the variability of the mothers weight. Causality must demonstrate that variance in one variable can only be due to influence of the other variable Directionality of Effect Problem Third Variable Problem 0.00 0.0 20.0 40.0 60.0 80.0 100.0 Mother's Weight CORRELATION DOES NOT MEAN CAUSATION Directionality of Effect Problem A high correlation does not give us the evidence to make a causeand-effect statement. A common example given is the high correlation between the cost of damage in a fire and the number of firemen helping to put out the fire. Does it mean that to cut down the cost of damage, the fire department should dispatch less firemen for a fire rescue! The intensity of the fire that is highly correlated with the cost of damage and the number of firemen dispatched. The high correlation between smoking and lung cancer. However, one may argue that both could be caused by stress; and smoking does not cause lung cancer. In this case, a correlation between lung cancer and smoking may be a result of a cause-and-effect relationship (by clinical experience + common sense?). To establish this cause-and-effect relationship, controlled experiments should be performed.

Directionality of Effect Problem Directionality of Effect Problem Class Attendance Class Attendance Higher Grades Higher Grades Aggressive Behavior Aggressive Behavior Viewing Violent TV Viewing Violent TV Aggressive children may prefer violent programs or Violent programs may promote aggressive behavior Methods for Dealing with Directionality Cross-Lagged Panel Cross-Lagged Panel design A type of longitudinal design Investigate correlations at several points in time STILL NOT CAUSAL Example next page Pref for violent TV.05 Pref for violent TV 3 rd grade 13 th grade.21.31.01 -.05 TV to later Aggression to later TV aggression Aggression Aggression 3 rd grade.38 13 th grade Third Variable Problem Class Exercise Identify the third variable Z that influences both and

Third Variable Problem Third Variable Problem Class Attendance + GPA Number of Mosques + Crime Rate Motivation Size of Population Third Variable Problem Third Variable Problem Ice Cream Consumed + Number of Drownings Reading Score + Reading Comprehension Temperature IQ Data Preparation - Correlation Correlation In SPSS Screen data for outliers and ensure that there is evidence of linear relationship, since correlation is a measure of linear relationship. Assumption is that each pair is bivariate normal. If not normal, then use Spearman. For this exercise, we will be using the data from the CD, under Chapter 8, korelasi.sav This data is a subset of a casecontrol study on factors affecting SGA in Kelantan. Open the data & select - >Analyse >Correlate >Bivariate

Correlation in SPSS Correlation Results We want to see whether there is any association between the mothers weight and the babies weight. So select the variables (weight2 & birthwgt) into Variables. Select Pearson Correlation Coefficients. Click the OK button. WEIGHT2 BIRTHWGT Correlations Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N WEIGHT2 BIRTHWGT 1.431*..017 30 30.431* 1.017. 30 30 *. Correlation is significant at the 0.05 level (2-tailed). The r = 0.431 and the p value is significant at 0.017. The r value indicates a fair and positive linear relationship. Scatter Diagram Spearman/Kendall Correlation BIRTHWEIGHT 3.6 3.4 3.2 3.0 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0.8.6.4.2 0.0 Rsq = 0.1861 0 10 20 30 40 50 60 70 80 90 100 If the correlation is significant, it is best to include the scatter diagram. The r square indicated mothers weight contribute 19% of the variability of the babies weight. To find correlation between a related pair of continuous data (not normally distributed); or Between 1 Continuous, 1 Categorical Variable (Ordinal) e.g., association between Likert Scale on work satisfaction and work output. MOTHERS' WEIGHT Example Spearman's rank correlation coefficient In statistics, Spearman's rank correlation coefficient, named for Charles Spearman and often denoted by the Greek letter ρ (rho), is a non-parametric measure of correlation that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. Unlike the Pearson product-moment correlation coefficient, it does not require the assumption that the relationship between the variables is linear, nor does it require the variables to be measured on interval scales; it can be used for variables measured at the ordinal level. Correlation between sphericity and visual acuity. Sphericity of the eyeball is continuous data while visual acuity is ordinal data (6/6, 6/9, 6/12, 6/18, 6/24), therefore Spearman correlation is the most suitable. The Spearman rho correlation coefficient is -0.108 and p is 0.117. P is larger than 0.05, therefore there is no significant association between sphericity and visual acuity.

Example 2 - Correlation between glucose level and systolic blood pressure. Based on the data given, prepare the following table; For every variable, sort the data by rank. For ties, take the average. Calculate the difference of rank, d for every pair and square it. Take the total. Include the value into the following formula; d 2 = 4921.5 n = 32 Therefore r s = 1-((6*4921.5)/(32*(32 2-1))) = 0.097966. This is the value of Spearman correlation coefficient (or ϒ). Compare the value against the Spearman table; p is larger than 0.05. Therefore there is no association between systolic BP and blood glucose level. Spearman s table 0.097966 is the value of Spearman correlation coefficient (or ρ). Compare the value against the Spearman table; 0.098 < 0.364 (p=0.05) p is larger than 0.05. Therefore there is no association between systolic BP and blood glucose level. SPSS Output Linear Regression Correlations Spearman's rho GLU BPS1 Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig. (2-tailed) N GLU BPS1 1.000.097..599 32 32.097 1.000.599. 32 32 Regression Regression and Prediction Predictor Variable Criterion Variable Height Weight Predicting based on a given value of

REGRESSION REGRESSION Regression one variable is a direct cause of the other or if the value of one variable is changed, then as a direct consequence the other variable also change or if the main purpose of the analysis is prediction of one variable from the other Regression: looking for a dependence of one variable, the dependent variable on another, the independant variable relationship is summarised by a regression equation. y = a + bx REGRESSION REGRESSION Regression The regression line x - independent variable - horizontal axis y - dependent variable - vertical axis Regression equation y = a + bx a = intercept at y axis b = regression coefficient Test of significance - b is not equal to zero Regression Linear Regression Testing for significance Come up with a Linear Regression Model to predict a continous outcome with a continous risk factor, i.e. predict BP with age. Usually the next step after correlation is found to be strongly significant. y = a + bx test whether the slope is significantly different from zero by: t = b/se(b) b= BP = regression estimate (b) * age + constant (a) + error term (å)

Regression Line Regression Graphic Regression Line 120 In a scatterplot showing the association between 2 variables, the regression line is the best-fit line and has the formula y=a + b a=place where line crosses axis b=slope of line (rise/run) Thus, given a value of, we can predict a value of y =47 Total ball toss points y =20 100 80 60 40 20 0 Rsq = 0.6031 8 10 12 14 16 18 20 22 24 26 Distance from target if x=18 then if x=24 then Regression Equation Regression Line (Defined) y = a + bx y = predicted value of y b = slope of the line x = value of x that you enter a = y-intercept (where line crosses y axis) In this case. y = 125.401-4.263(x) Regression line is the line where absolute values of vertical distances between points on scatterplot and a line form a minimum sum (relative to other possible lines) So if the distance is 20 feet y = -4.263(20) + 125.401 y = -85.26 + 125.401 y = 40.141 Positive and Linear Negative and Linear Example SPSS Regression Set-up Criterion, y axis variable, what you re trying to predict x = 6426 x2 = 1338088 y = 4631 xy = 929701 n = 32 b = (929701-(6426*4631/32))/ (1338088-(64262/32)) = -0.00549 Mean x = 6426/32=200.8125 mean y = 4631/32=144.71875 a = 144.71875+(0.00549*200.8125) = 145.8212106 Systolic BP = 144.71875-0.00549.chol Predictor, x axis variable, what you re basing the prediction on

Getting Regression Info from SPSS Model Summary Model 1 Adjusted Std. Error of R R Square R Square the Estimate.777 a.603.581 18.476 a. Predictors: (Constant), Distance from target a Coefficients a y = a + b (x) y = 125.401 4.263(20) Model 1 Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. (Constant) 125.401 14.265 8.791.000 Distance from targ -4.263.815 -.777-5.230.000 a. Dependent Variable: Total ball toss points b