Upon completion of this chapter, you should be able to:

Similar documents
Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

THE PEARSON CORRELATION COEFFICIENT

Statistics Introductory Correlation

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Chs. 16 & 17: Correlation & Regression

About Bivariate Correlations and Linear Regression

14: Correlation. Introduction Scatter Plot The Correlational Coefficient Hypothesis Test Assumptions An Additional Example

Chapter Eight: Assessment of Relationships 1/42

CORELATION - Pearson-r - Spearman-rho

Chs. 15 & 16: Correlation & Regression

psychological statistics

Module 8: Linear Regression. The Applied Research Center

Can you tell the relationship between students SAT scores and their college grades?

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

Measuring Associations : Pearson s correlation

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

1 Correlation and Inference from Regression

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Using SPSS for One Way Analysis of Variance

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Ch. 16: Correlation and Regression

LOOKING FOR RELATIONSHIPS

Correlation: Relationships between Variables

Review of Multiple Regression

1 A Review of Correlation and Regression

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

REVIEW 8/2/2017 陈芳华东师大英语系

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

Business Statistics. Lecture 10: Correlation and Linear Regression

MEI STRUCTURED MATHEMATICS STATISTICS 2, S2. Practice Paper S2-A

Retrieve and Open the Data

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Reminder: Student Instructional Rating Surveys

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Contents. Acknowledgments. xix

Correlation and simple linear regression S5

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

ECON 497 Midterm Spring

Regression Analysis. BUS 735: Business Decision Making and Research

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Spearman Rho Correlation

Do not copy, post, or distribute

MORE ON SIMPLE REGRESSION: OVERVIEW

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Analysing data: regression and correlation S6 and S7

Overview of Structure and Content

Correlation and Regression (Excel 2007)

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

Results and Analysis 10/4/2012. EE145L Lab 1, Linear Regression

Review of Statistics 101

Notes 6: Correlation

Chapter 7: Correlation

Stat 705: Completely randomized and complete block designs

Relationships between variables. Association Examples: Smoking is associated with heart disease. Weight is associated with height.

Pearson s Product Moment Correlation: Sample Analysis. Jennifer Chee. University of Hawaii at Mānoa School of Nursing

11 Correlation and Regression

Logistic Regression Analysis

Two-Sample Inferential Statistics

Simple Linear Regression: One Quantitative IV

AP STATISTICS Name: Period: Review Unit IV Scatterplots & Regressions

Correlation and Regression

3 Non-linearities and Dummy Variables

Inferences for Correlation

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin

y response variable x 1, x 2,, x k -- a set of explanatory variables

9. Linear Regression and Correlation

Introduction to Statistics for the Social Sciences Review for Exam 4 Homework Assignment 27

16.400/453J Human Factors Engineering. Design of Experiments II

Finding Relationships Among Variables

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Slide 7.1. Theme 7. Correlation

S.ID.C.8: Correlation Coefficient

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

ECON3150/4150 Spring 2015

Objectives. 2.1 Scatterplots. Scatterplots Explanatory and response variables. Interpreting scatterplots Outliers

UGRC 120 Numeracy Skills

y n 1 ( x i x )( y y i n 1 i y 2

Ordinary Least Squares Regression Explained: Vartanian

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Introduction and Single Predictor Regression. Correlation

appstats27.notebook April 06, 2017

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Chapter 9. Correlation and Regression

CORRELATION AND REGRESSION

Chapter 12 : Linear Correlation and Linear Regression

Chapter 9 - Correlation and Regression

Transcription:

1 Chaptter 7:: CORRELATIION Upon completion of this chapter, you should be able to: Explain the concept of relationship between variables Discuss the use of the statistical tests to determine correlation Interpret SPSS outputs on correlation tests CHAPTER OVERVIEW Introduction What is correlation coefficient? o Pearson product moment Range of values o Positive correlation o Negative correlation o Zero correlation Calculation of the correlation coefficient SPSS correlation coefficient Correlation and causation Summary Key Terms Chapter 1: Introduction Chapter 2: Descriptive Statistics Chapter 3: The Normal Distribution Chapter 4: Hypothesis Testing Chapter 5: T-test Chapter 6: Oneway Analysis of Variance Chapter 7: Correlation Chapter 8: Chi-Square This chapter introduces the concept of correlation and how it is used in analysing educational data. The correlation coefficient is a useful statistical tool in showing the relationship between two variables. The relationship can range from 1.00 to + 1.00; though in the behavioural sciences seldom is there a perfect positive or negative correlation between two variables. However, it should be emphasised that correlation is not causation. In other words, even though there is a high correlation between A and B; it does not mean that A caused B. Introduction

2 Researchers are often concerned with the way two variables relate to each other for a given groups of persons such as students in schools, workers in a factory or office. For example, do students who have higher scores in mathematics also have higher scores in mathematics? Is there a relationship between a person's self-esteem and his or her personality? Is there a relationship between attitudes towards reading and the number of books read? Is there a relationship between years of experience as a teacher and attitudes towards teaching? These are some of the questions asked by educational researchers. To answer these questions, you must make observations or collect data for each variable for a group of persons. What is Correlation Coefficient? The correlation coefficient a concept from statistics is a measure of how well trends in the predicted values follow trends in past actual values. It is a measure of how well the predicted values from a forecast model "fit" with the real-life data. The correlation coefficient is a number between 0 and 1. If there is no relationship between the predicted values and the actual values the correlation coefficient is 0 or very low (the predicted values are no better than random numbers). As the strength of the relationship between the predicted values and actual values increases so does the correlation coefficient. A perfect fit gives a coefficient of 1.0. Thus the higher the correlation coefficient the better. a) Pearson Product Moment Correlation Coefficient Pearson's product moment correlation coefficient, usually denoted by r, is one example of a correlation coefficient. It is a measure of the linear association between two variables that have been measured on interval or ratio scales, such as the relationship between amount of education and income levels. If there is a relationship between amount of education and income levels, the two variables co-vary. b) Assumptions Testing Correlational analysis has the following underlying assumptions: (S. Coakes and L. Steed, 2002, SPSS Analysis Without Anguish. Brisbane: John Wiley & Sons) Related Pairs the data to be collected from related pairs: i.e. if you obtain a score on an X variable, there must ne a score on the Y variable from the same subject. Scale of Measurement data should be interval or ration in nature Normality the scores for each variable should be normally distributed Linearity the relationship between the two variables must be linear

3 Homogeneity of Variance the variability in scores for one variable is roughly the same at all values of the other variable; i.e. it is concerned with how the scores cluster uniformly about the regression line. c) Strength of the Correlation The strength of a relationship is indicated by the size of the correlation coefficient: the larger the correlation, the stronger the relationship. A strong relationship exists where cases in one category of the X variable usually have a particular value on the Y variable while those in a different value of X have a different value on Y. For example, if people who exercise regularly nearly always have better health than those who do not exercise, then exercise and health are more strongly correlated. If those who exercise regularly are just a little more likely to be healthy than the nonexercisers then the two variables are only weakly related. How high does a correlation coefficient have to be to be called strong? How small is weak correlation? The answer to these questions varies with the variables being studied. For example, if the literature shows that in previous research, a correlation of 0.51 was found between variable X and variable Y, but in your study you obtained a correlation of 0.60; then you might conclude that the correlation between variable X and Y is strong. However, Cohen (1988) has provided some guidelines to determine the strength of the relationship between two variables by providing descriptors for the coefficients (see Table 7.1). Keep in mind that in education and psychology it is rarely that the coefficients will be very strong or near perfect since the variables measured are constructs involving human characteristics which are subject to wide variation. Trivial Low to Moderate to Substantial to Very Strong Near Moderate Substantial Very Strong Perfect 0.01-0.09 0.10-0.29 0.30-0.49 0.50-0.69 0.70-0.89 > 0.90 Table 7.1 General guidelines on the strength of the relationship between variables

4 EXAMPLE: Data was gathered for the following two variables from a sample of 12 students. Student No. IQ Test Scores (Science Test) (X) (Y) 1 120 31 2 112 25 3 110 19 4 120 24 5 103 17 6 126 28 7 113 18 8 114 20 9 106 16 10 108 15 11 128 27 12 109 19 Each unit or student is represented by a point on the scatter diagram. A dot is placed for each student at the point of intersection of a straight line drawn through his IQ score perpendicular to the X axis and through his Science score perpendicular to the Y axis. For example, a student who obtained an IQ score of 120 also obtained a Science score of 24. The intersection between these lines is represented by the dot 'A'. The scatter diagram (see Figure 7.1) which shows a moderate positive relationship between IQ Scores and Science Scores. However, we do not have a summarised measure of this relationship. There is need for a more precise measure to describe the relationship between the two variables. You need a numerical descriptive measure of the correlation between IQ scores and Science scores which will be discussed later.

5 A Figure 7.1 Scatter Diagram Showing the Relationship between IQ Scores (X axis) and Science Score (Y axis) for 12 Students Range of Values (rxy) Note that rxy can never take on a value less than - 1 nor a value greater than + 1. The following are three graphs showing various values of rxy and the type of linear relationship that exists between X and Y for the given values of rxy.

6 a) POSITIVE CORRELATION Value of rxy = + 1.00 = Perfect & Direct Relationship 5 English Score [y axis] 4 3 1 2 3 4 Attitude Towards English [x axis] Figure 7.2 Perfect Correlation See Figure 7.2. If Attitudes (x) and English Achievement (y) had a positive relationship than the Slope (β1) will be a positive number. Lines with positive slopes go from the bottom left toward the upper right. i.e. and increase from 1 to 2 on the x axis is followed by an increase from 3 to 3.5 on the y axis.

7 b) NEGATIVE CORRELATION Value of rxy = 1.00 = Perfect Inverse Relationship 5 Score [y axis] 4 English 3 1 2 3 4 Attitude Towards English [x axis] Figure 7.3 Negative Correlation If Attitudes (x) and English Achievement (y) have a negative relationship than the Slope (β1) will be a negative number. Lines with negative slopes go from the upper right to the lower left. The above graph has a slope of -1. An increase of 1 on the X axis is associated with a decrease of 0.5 on the Y Axis; i.e. an increase from 1 to 2 on the x axis is followed by a decrease from 5 to 4.5 on the y axis.

8 c) ZERO CORRELATION Value of rxy =.00 = No Relationship 5 English Score [y axis] 4 3 1 2 3 4 Attitude Towards English [x axis] Figure 7.4 No Correlation If Attitudes (x) and English Achievement (y) have NO relationship than the Slope (β1) will be ZERO (see Figure 7.4). In other words, there is NO SYSTEMATIC RELATIONSHIP between X and Y. Some students with high Attitude scores have positive low English scores while some students have low Attitude score have high positive English scores..

9 Correlation of the Correlation Coefficient (r or rxy) A researcher conducted a study to determine the relationship between verbal and spatial ability. She was interested in finding out whether students who scored high on verbal ability also scored high on spatial ability. She administered two 15 item tests measuring verbal and spatial ability to a sample 12 primary school students. The results of the study are shown in the table below: Student Verbal Test Spatial Test x y x² y² xy 1 13 7 169 49 91 2 10 6 100 36 60 3 12 9 144 81 108 4 14 10 196 100 140 5 10 7 100 49 70 6 12 11 144 122 132 7 13 12 169 144 156 8 9 10 81 100 90 9 14 13 196 169 182 10 11 12 122 144 132 11 8 9 64 81 72 12 9 8 81 64 72 Σx = 135 Σy = 114 Σx² = 1566 Σy² =1139 Σxy =1305 a) Illustration Of The Calculation Of The Correlation Coefficient (R or Rxy) for the Data in the Table Above. The Pearson Correlation Coefficient (called the Pearson r) is the commonly used formula in computing the correlation between two variables. The formula measures the strength and direction of a linear relationship between variable X and variable Y. The sample correlation coefficient is denoted by r. The formula for the sample correlation coefficient is:

10 (Σ x) (Σ y) SSxy = Σ xy = 22.50 n (Σ x)² SSxx = Σ x² = 47.25 n (Σy)² SSyy = Σ y² = 56.00 n Using the formula to obtain the correlation coefficient : 22.50 = (47.50)(56.00) = 0.437 To Obtain A Bivariate Pearson Product-Moment Correlation Using SPSS A study was conducted to determine the relationship between reading ability and performance in science. A reading ability and science test was administered to 200 lower secondary students. The Pearson product-moment correlation was used to determine the significance of the relationship. The steps for using SPSS is shown below:

11 SPSS Procedures: 1. Select the Analyze menu. 2. Click on Correlate and then Bivariate... to open the Bivariate Correlations dialogue box. 3. Select the variables you require (i.e. reading and science) and click on the button to move the variables into the Variables: box. 4. Ensure that the Pearson correlation option has been selected. 5. In the Test of Significance box, select the One-tailed radio button. 6. Click on OK. SPSS Output: Reading Science Reading Pearson Correlation 1.000 0.630** Sig. (1-tailed) 0.000 N 200 200 Science Pearson Correlation 0.630** 1.000 Sig. (1-tailed) 0.000 N 200 200 To interpret the correlation coefficient, you examine the coefficient and its associated significance value (p). The output show that the relationship between Reading and Science scores is significant with a correlation coefficient of r = 0.63 which is p <.05. Thus higher reading scores are associated with higher scores in science. NULL HYPOTHESIS The null hypothesis (Ho:) states that the correlation between X and Y is ρ = 0.0. What is the probability that the correlation obtained in the sample came from a population where the parameter ρ = 0.0? The t-test for the significance of a correlation coefficient is used. Note that the correlation between Reading and Science (r = 0.630) is significant at p < 0.05. Hence, the null hypothesis is REJECTED which affirms that the two variables are positively related in the population.

12 Coefficient of Determination r = the correlation between X and Y = 0.630 and r² = the coefficient of determination = (0.630)² = 0.3969 Hence 39.6% of the variance in Y that can be explained by X. TO OBTAIN A SCATTERPLOT USING SPSS SPSS Procedures: 1. Select the Graph menu. 2. Click on Scatter... to open the Scatterplot dialogue box 3. Ensure Simple Scatterplot option is selected. 4. Click on the Define command pushbutton to open the Simple Scatterplot subdialogue box. 5. Select the first variable (i.e. science) and click on the button to move the variable into the Y Axis: box.. 6. Select the second variable (i.e. reading) and click on the button to move the variable into the X Axis: box. 6. Click on OK.

13 SPSS Output 80 70 60 50 40 30 SCIENCE 20 30 40 50 60 70 80 READING Figure 7.4 Scatterplot As you can see from the scatterplot (Figure 7.4) there is a linear relationship between reading and Science scores. Given that the scores cluster uniformly around the regression line, the assumption of homogeneity of variance has not been violated. Causation And Correlation Causation and correlation are two concepts that has been wrongly interpreted by some researchers. The presence of a correlation between two variables does not necessarily mean there exists a causal link between them. Say for instance that the there is a correlation (0.60) between "teachers salary" and "academic performance of students". Does this imply that a well-paid teaching staff "cause" better academic performance of students? Would the percent of academic performance increase if we increased the pay of teachers? It is dangerous to conclude causation just because there is a correlation or relationship between two variables. It tells nothing by itself about whether "teachers salary" causes "achievement".

14 Significance Of The Correlation Coefficient We introduced Pearson correlation as a measure of the strength of a relationship between two variables. But any relationship should be assessed for its significance as well as its strength. The significance of the relationship is expressed in probability levels: p (e.g., significant at p =.05). This tells how unlikely a given correlation coefficient, r, will occur given no relationship in the population. It assumes that you have a sample of cases from a population. The question is whether your observed statistic for the sample is likely to be observed given some assumption of the corresponding population parameter. If your observed statistic does not exactly match the population parameter, perhaps the difference is due to sampling error. To be useful, a correlation coefficient needs to be accompanied by a test of statistical significance. It is also important for you to know about the sample size. Generally, a strong correlation in a small population may be statistically nonsignificant, while a much weaker correlation in a large sample may be statistically significant. For example, in a large sample, even low correlations (as low as 0.06) can be statistically significant. Similar sized correlations that are statistically significant with large samples are not significant for the smaller samples, This is because with smaller samples the likelihood of sampling error is higher.

15 LEARNING ACTIVITY A researcher conducted a study which aimed to determine the relationship between self-efficacy and academic performance in geography. A 20 item selfefficacy scale and a 25 item geography test was administered to a group of 12 students. The following are the results of the study: Self-Efficacy Scale Geography Test 15 22 13 17 14 20 12 18 16 23 12 21 11 19 17 24 15 19 13 16 a) What is the correlation coefficient? b) What is the mean for the self-efficacy scale and the mean or the geography test? c) Comment on the scatter plot. SUMMARY

16 The correlation coefficient a concept from statistics is a measure of how well trends in the predicted values follow trends in past actual values. Pearson's product moment correlation coefficient, usually denoted by r, is a measure of the linear association between two variables The null hypothesis (Ho:) states that the correlation between X and Y is ρ = 0.0. The presence of a correlation between two variables does not necessarily mean there exists a causal link between them. The strength of a relationship is indicated by the size of the correlation coefficient: the larger the correlation, the stronger the relationship. The scatterplot is a graphical representation of the intersection of a point on the x-axis with the point on the y-axis. The presence of a correlation between two variables does not necessarily mean there exists a causal link between them. The coefficient of determination is the proportion of variance in Y that can be explained by X. KEY WORDS: Correlation Correlation coefficient Pearson product moment Range of values Positive correlation Negative correlation Zero correlation Scatterplot Causation Coefficient of determination ----------00--------