THE PEARSON CORRELATION COEFFICIENT

Similar documents
Can you tell the relationship between students SAT scores and their college grades?

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Business Statistics. Lecture 10: Correlation and Linear Regression

AMS 7 Correlation and Regression Lecture 8

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Unit 6 - Introduction to linear regression

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Tables Table A Table B Table C Table D Table E 675

Module 8: Linear Regression. The Applied Research Center

Chapter 16: Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

REVIEW 8/2/2017 陈芳华东师大英语系

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Bivariate Data Summary

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

Chapter 4 Describing the Relation between Two Variables

Review of Statistics 101

Unit 6 - Simple linear regression

A company recorded the commuting distance in miles and number of absences in days for a group of its employees over the course of a year.

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Upon completion of this chapter, you should be able to:

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Contents. Acknowledgments. xix

Sociology 6Z03 Review I

Chapter 7. Scatterplots, Association, and Correlation

Important note: Transcripts are not substitutes for textbook assignments. 1

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Chapter 8. Linear Regression /71

Chapter 6: Exploring Data: Relationships Lesson Plan

CORRELATION AND REGRESSION

appstats8.notebook October 11, 2016

Introduction and Single Predictor Regression. Correlation

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Chapter 5 Friday, May 21st

MICHIGAN STANDARDS MAP for a Basic Grade-Level Program. Grade Eight Mathematics (Algebra I)

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

Statistics in medicine

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Inferences for Regression

Chapter Eight: Assessment of Relationships 1/42

LOOKING FOR RELATIONSHIPS

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 3: Examining Relationships

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

AP Statistics Two-Variable Data Analysis

Simple Linear Regression

Analysis of Bivariate Data

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

SIMPLE REGRESSION ANALYSIS. Business Statistics

Correlation and simple linear regression S5

Grade 8 Math Spring 2017 Item Release

Lecture 5: ANOVA and Correlation

Psych 230. Psychological Measurement and Statistics

Lecture (chapter 13): Association between variables measured at the interval-ratio level

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Bivariate Relationships Between Variables

1 Correlation and Inference from Regression

Business Statistics. Lecture 9: Simple Regression

Chapter 4: Regression Models

Chapter 19 Sir Migo Mendoza

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

I can Statements Grade 8 Mathematics

Mathematics Grade 8 PA Alternate Eligible Content

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

Chapter 10 Correlation and Regression

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Stat 101 Exam 1 Important Formulas and Concepts 1

Chi-Square. Heibatollah Baghi, and Mastee Badii

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

Correlation & Simple Regression

Bivariate statistics: correlation

Statistics Introductory Correlation

Review of Multiple Regression

9. Linear Regression and Correlation

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Do not copy, post, or distribute

Regression Analysis: Exploring relationships between variables. Stat 251

The Simple Linear Regression Model

Middle School Math 3 Grade 8

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

1) A residual plot: A)

The response variable depends on the explanatory variable.

Applied Regression Analysis

bivariate correlation bivariate regression multiple regression

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

y n 1 ( x i x )( y y i n 1 i y 2

Chapter 3: Describing Relationships

Describing Bivariate Relationships

Readings Howitt & Cramer (2014) Overview

Sociology 593 Exam 2 Answer Key March 28, 2002

Transcription:

CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There are several different ways to categorize relations. One way is based on the precision of the relation: Functional relations Statistical relations We can also categorize relations as to whether or not they imply that changes in one variable actually cause changes in the second variable: Causal relations Correlational relations

THE PEARSON CORRELATION COEFFICIENT We often use the Pearson correlation coefficient (r) to measure a correlational relation between two variables that are both continuous. There is no true independent variable here. Example: How is height related to weight?

WHAT DO CORRELATIONS TELLS US? Direction of the Relation a. Positive correlation (+): variables move in the same direction b. Negative correlation (-): variables move in opposite Form of the Relation a. Linear b. Nonlinear Magnitude of the Relation a. Range from -1.00 to +1.00 b. No correlation = 0

SCATTERPLOTS We often use scatterplots for detecting the presence of a correlation between variables. These are easy to draw, as well as graph in excel or SPSS Let s do some examples of judging correlational relations from scatterplots

HOW R IS CALCULATED r = Degree to which X and Y vary together Degree to which X and Y vary separately r = (Σz X z Y )/n df = n - 2

ISSUES INTERPRETING CORRELATIONS Correlation does not equal causation Correlation value can be greatly affected when the range of scores is limited Restricted range problem One or two extreme data points, or outliers, can dramatically affect the correlation value To describe how accurately one variable predicts the other, use r 2 (the coefficient of determination) For r =.5, 25% of the variability in one variable can be predicted from the other variable (.5 2 =. 25)

WHAT DOES A REGRESSION LINE DO? Makes the relationship between two variables easier to see. Identifies the center of the relationship, providing a simplified description of the relationship. Establishes a precise relationship between each X value and a corresponding Y value. Thus, the line can be used for prediction.

WHY USE REGRESSION? Goal of Regression: Find the equation for the line that best describes the relation for a set of X and Y data, in order to predict future values. Much like the correlation, we use regression when we have continuous variables involved. Same limitations apply to causal relations

LINEAR EQUATIONS Express a linear relationship between two variables (X and Y) as: Y = bx + a b is called the slope Determines how much the Y variable changes when X is increased Tells us direction of line and how steep it will be a is called the Y-intercept Tells us where the line crosses the Y axis

LINEAR EQUATIONS 80 70 Y: Amount Due 60 50 40 30 20 0 0 1 2 3 4 5 6 7 8 9 10 11 12 X: Hours of Exercise at YMCA Y = 5(X) + 20 Amt Due = $5(Hrs Exercise) + $20

THE LEAST-SQUARES SOLUTION Y = bx + a Numerically define the distance between the line and each data point For each X, the regression equation predicts a Y This is called Ŷ ( Y hat ) Distance between predicted value and actual Y is Y Ŷ Can be positive or negative The best-fitting line is the one that has the smallest error Commonly called the least-squared-error solution Ŷ = bx + a is the regression equation for Y

THE LEAST SQUARES SOLUTION 100 90 The relationship between average hours of sleep per night (X) and grades (Y) Grade 80 70 60 50 40 30 Regression Equation: Ŷ = 6.6X + 36.4 Grade = 6.6(Hrs Sleep) + 36.4 20 10 0 0 2 4 6 8 10 Hours of Sleep

THE STANDARD ERROR OF THE ESTIMATE In reality, data points rarely fall along the regression line (which would indicate a perfect correlation) Regression equation allows us to make predictions, but does not provide information about the accuracy of the predictions Compute standard error of estimate to measure the typical distance between a regression line and the actual data points

THE STANDARD ERROR OF THE ESTIMATE 100 Returning to our example of the relationship between average hours of sleep per night (X) and grades (Y) Standard Error of Estimate = 4.77 Grade 90 80 70 60 50 40 30 20 10 0 0 2 4 6 8 10 Hours of Sleep Ŷ = 6.6X + 36.4 The typical distance between data points and the regression line is 4.77.

RELATIONSHIP BETWEEN STANDARD ERROR AND CORRELATION When correlation is large in magnitude (close to +1 or -1), standard error of estimate will be small Points are clustered close to regression line When correlation is small (close to zero), standard error of estimate will be large Points are spread out and not close to regression line

WHAT IS THE CHI-SQUARE All of our tests through chapter 14 tested the relation between categorical IVs and a continuous DV. Correlations and regression examined the relation between two continuous variables. A chi-square is a nonparametric test used to test the relation between 2 categorical variables. Example: How is gender related to political orientation?

THE HOW OF THE CHI-SQUARE The chi-square (χ 2 ) uses sample frequencies to test a hypothesis about the presence of a relationship between two variables How well do the observed frequencies fit the expected frequencies specified by null hypothesis? Null states that two variables are independent: No consistent, predictable relationship Frequency distribution for one variable has same shape for all categories of the second variable