Relationship Between Interval and/or Ratio Variables: Correlation & Regression. Sorana D. BOLBOACĂ

Similar documents
Can you tell the relationship between students SAT scores and their college grades?

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation and Linear Regression

Intro to Linear Regression

Unit 6 - Introduction to linear regression

9 Correlation and Regression

Intro to Linear Regression

Bivariate Relationships Between Variables

Business Statistics. Lecture 10: Correlation and Linear Regression

1 A Review of Correlation and Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Unit 6 - Simple linear regression

Chapter 8: Correlation & Regression

Chapter 6: Exploring Data: Relationships Lesson Plan

Correlation: Relationships between Variables

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Chapter 4 Describing the Relation between Two Variables

THE PEARSON CORRELATION COEFFICIENT

Chapter 7. Scatterplots, Association, and Correlation

Mrs. Poyner/Mr. Page Chapter 3 page 1

Linear Regression and Correlation. February 11, 2009

Correlation & Simple Regression

About Bivariate Correlations and Linear Regression

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Analysing data: regression and correlation S6 and S7

Correlation. What Is Correlation? Why Correlations Are Used

Correlation and regression

Chapter 5 Friday, May 21st

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Inferences for Regression

Reminder: Student Instructional Rating Surveys

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

Simple Linear Regression Using Ordinary Least Squares

Important note: Transcripts are not substitutes for textbook assignments. 1

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Chapter 16: Correlation

Correlation and simple linear regression S5

CORELATION - Pearson-r - Spearman-rho

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Statistics in medicine

Correlation and Simple Linear Regression

Measuring Associations : Pearson s correlation

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Scatterplots and Correlation

Key Algebraic Results in Linear Regression

Correlation and Regression

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

9. Linear Regression and Correlation

Linear Correlation and Regression Analysis

Lecture 11: Simple Linear Regression

Statistics 5100 Spring 2018 Exam 1

y n 1 ( x i x )( y y i n 1 i y 2

Simple Linear Regression

Bivariate data analysis

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Scatter plot of data from the study. Linear Regression

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Readings Howitt & Cramer (2014) Overview

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Chapter 16: Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Applied Regression Analysis

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 9 - Correlation and Regression

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Regression. Marc H. Mehlman University of New Haven

Midterm 2 - Solutions

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Review of Multiple Regression

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Statistics Introductory Correlation

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Practical Biostatistics

This gives us an upper and lower bound that capture our population mean.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Chapter 27 Summary Inferences for Regression

Prob/Stats Questions? /32

appstats27.notebook April 06, 2017

AMS 7 Correlation and Regression Lecture 8

Linear regression and correlation

Describing Bivariate Relationships

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Week 8: Correlation and Regression

Chs. 16 & 17: Correlation & Regression

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

23. Inference for regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Readings Howitt & Cramer (2014)

Review of Statistics 101

AP Statistics Two-Variable Data Analysis

Slide 7.1. Theme 7. Correlation

CREATED BY SHANNON MARTIN GRACEY 146 STATISTICS GUIDED NOTEBOOK/FOR USE WITH MARIO TRIOLA S TEXTBOOK ESSENTIALS OF STATISTICS, 3RD ED.

Introduction to Linear Regression

Chs. 15 & 16: Correlation & Regression

Transcription:

Relationship Between Interval and/or Ratio Variables: Correlation & Regression Sorana D. BOLBOACĂ

OUTLINE Correlation Definition Deviation Score Formula, Z score formula Hypothesis Test Regression - Intercept and Slope - Un-standardized Regression Line - Standardized Regression Line - Hypothesis Tests 2

Correlation: 3 Characteristics 1. Direction Positive(+) Negative (-) 2. Degree of association Between 1 and 1 Absolute values signify strength 3 3. Form Linear Non-linear

C2 C2 Correlation: 1. Direction Positive 20.0 C1 vs C2 120.0 C1 vs C2 Negative 80.0 13.3 40.0 6.7 0.0 0.0 4.0 8.0 12.0 C1 Large values of X = large values of Y Small values of X = small values of Y 0.0 0.0 83.3 166.7 250.0 C1 Large values of X = small values of Y Small values of X = large values of Y 4 - e.g. IQ and SAT -e.g. SPEED and ACCURACY

C2 C2 Correlation: 2. Degree of association 20.0 Strong (tight cloud) C1 vs C2 120.0 Weak C1 vs C2 (diffuse cloud) 13.3 80.0 6.7 40.0 0.0 0.0 4.0 8.0 12.0 C1 0.0 0.0 4.0 8.0 12.0 C1 5

Correlation: 3. Form Linear Non- linear 6

Weight (pounds) What is the best fitting straight line? Regression Equation: Y = a + bx How closely are the points clustered around the line? Pearson s R 250 225 200 175 150 67 68 69 70 71 72 73 74 75 76 77 7 Height (inches)

Correlation: Definition Correlation: a statistical technique that measures and describes the degree of linear relationship between two variables Dataset Scatterplot Obs X Y A 1 1 B 1 3 C 3 2 D 4 5 E 6 4 F 7 5 Y 8 X

The logic of regression MEAN of X Below average on X Above average on X MEAN of Y Above average on Y Below average on X Above average on Y Above average on X Below average on Y Below average on Y 9 Cross-Product = ( X X )( Y Y ) For a strong positive association, the cross-products will mostly be positive

The logic of regression MEAN of X Below average on X Above average on X MEAN of Y Above average on Y Below average on X Above average on Y Above average on X Below average on Y Below average on Y 10 Cross-Product = ( X X )( Y Y ) For a strong negative association, the cross-products will mostly be negative

The logic of regression MEAN of X Below average on X Above average on X MEAN of Y Above average on Y Below average on X Above average on Y Above average on X Below average on Y Below average on Y 11 Cross-Product = ( X X )( Y Y ) For a weak association, the cross-products will be mixed

Pearson Correlation Coefficient Symbol: r, R A value ranging from -1.00 to 1.00 indicating the strength (look to the number of correlation coefficient) and direction (look to the sign of the correlation coefficient) of the linear relationship. Absolute value indicates strength +/- indicates direction Sum of products r X X Y Y 2 X X Y Y 2 12

Pearson Correlation Coefficient Assumptions: 1. The errors in data values are independent from one another 2. Correlation always requires the assumption of a straightline relationship 3. The variables are assumed to follow a bivariate normal distribution 13

Bivariate Normal Distribution http://www.aos.wisc.edu/~dvimont/aos575/handouts/b ivariate_notes.pdf 14

Pearson Correlation Coefficient Femur Humerus ( X X ) ( Y Y ) 2 ( X X ) 2 ( Y Y ) ( X X )( Y Y ) A 38 41 B 56 63 C 59 70 D 64 72 E 74 84 Mean 58.2 66.00 SS X SS Y SP r SP SS X SS Y 15

Pearson Correlation Coefficient Femur Humerus ( X X ) ( Y Y ) ( X X ) A 38 41-20.2-25 408.04 625 505 B 56 63-2.2-3 4.84 9 6.6 C 59 70 0.8 4.64 16 3.2 D 64 72 5.8 6 33.64 36 34.8 E 74 84 15.8 18 249.64 324 284.4 mean 58.2 66.00 696.8 1010 834 SS X SS Y SP 2 2 Y ( X X )( Y Y ) ( Y ) 16 r = 0.99

Pearson Correlation Coefficient For a strong positive association, the SP (sum of products) will be a big positive number Below average on X Above average on Y Below average on X Below average on Y Above average on X Above average on Y Above average on X Below average on Y 17

Pearson Correlation Coefficient For a strong negative association, the SP will be a big negative number Below average on X Above average on X Above average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y 18

Pearson Correlation Coefficient For a weak association, the SP will be a small number (+ and will cancel each other out) Below average on X Above average on X Above average on Y Above average on Y Below average on X Above average on X Below average on Y Below average on Y 19

Pearson Correlation Coefficient: Interpretation A measure of strength of association: how closely do the points cluster around a line? A measure of the direction of association: is it positive or negative? Colton [Colton T. Statistics in Medicine. Little Brown and Company, New York, NY 1974] rules: R [-0.25 to +0.25] No relation R (0.25 to +0.50] (-0.25 to -0.50] weak relation R (0.50 to +0.75] (-0.50 to -0.75] moderate relation R (0.75 to +1) (-0.75 to -1) strong relation 20

Pearson Correlation Coefficient: Interpretation The P-value is the probability that you would have found the current result if the correlation coefficient were in fact zero (null hypothesis). If this probability is lower than the conventional significance level (e.g. 5%) (p < 0.05) the correlation coefficient is called statistically significant. 21

Spearman Rank Correlation Coefficient Not continuous measurements The assumption of bivariate normal distribution is violated Symbol: ρ (Rho Greek Letter) 22

Spearman Rank Correlation Coefficient The sign of the Spearman correlation indicates the direction of association between X (the independent variable) and Y (the dependent variable). ρ =1 the two variables being compared are monotonically related. N.B. This does not give a perfect Pearson correlation. 23

Interpretation of r-squared (r 2 ) The amount of covariation compared to the amount of total variation The percent of total variance that is shared variance E.g. If r = 0.80, then X explains 64% of the variability in Y (and vice versa) 24 24

Properties of correlation coefficient A standardized statistic will not change if you change the units of X or Y. The same whether X is correlated with Y or vice versa Fairly unstable with small n Vulnerable to outliers Has a skewed distribution 25 25

Correlation coefficient by example Hester KL, Macfarlane JG, Tedd H, Jary H, McAlinden P, Rostron L, Small T, Newton JL, De Soyza A. Fatigue in bronchiectasis. QJM. 2011 Oct 20. [Epub ahead of print] Results: Fatigue correlated with MRCD score (Medical Research Council dyspnoea score) (r = 0.57, P < 0.001) and FEV(1)% predicted (r = -0.30, P = 0.001). 26

27 Correlation coefficient by example

Correlation coefficient by example Canan F, Ataoglu A, Ozcetin A, Icmeli C. The association between Internet addiction and dissociation among Turkish college students. Compr Psychiatry. 2011 Oct 13. [Epub ahead of print] RESULTS: According to the Internet Addiction Scale, 9.7% of the study sample was addicted to the Internet. The Pearson correlation analysis results revealed a significant positive correlation between dissociative experiences and Internet addiction (r = 0.220; P <.001) and weekly Internet use (r = 0.227; P <.001). Levels of Internet addiction were significantly higher among male students than female students (P <.001). The Internet use pattern also differed significantly between sexes. 28

Linear Regression Simple Linear regression Multiple linear regression 29

Linear Regression: Assumptions The errors in data values (e.g. the deviation from average) are independent from one another Regressions depends on the appropriateness of the model used in the fit The independent readings (X) are measured as exactly known values (measured without error) The variance of Y is the same for all values of X The distribution of Y is approximately normal for all values of X 30

Linear Regression But how do we describe the line? If two variables are linearly related it is possible to develop a simple equation to predict one variable from the other The outcome variable is designated the Y variable, and the predictor variable is designated the X variable E.g. centigrade to Fahrenheit: F = 32 + 1.8ºC this formula gives a specific straight line 31 31

Linear Equation 32 F = 32 + 1.8(C) General form is Y = a + bx The prediction equation: Ỹ = a+ bx a = intercept, b = slope, X = the predictor, Y = the criterion a and b are constants in a given line; X and Y change 32 32

Slope and Intercept Equation of the line: Ỹ = a + bx The slope b: the amount of change in Y with one unit change in X sy SP b r s SS The intercept a: the value of Y when X is zero x X a Y bx The slope is influenced by r, but is not the same as r 33

Age (years) When there is no linear association (r = 0), the regression line is horizontal, b=0 45 40 35 30 25 20 68 69 70 71 72 73 74 75 76 77 34 Height (inches) and our best estimate of age is 29.5 at all heights. 34

Height (cm) When the correlation is perfect (r = ± 1.00), all the points fall along a straight line with a slope 200 195 190 35 185 180 175 170 68 69 70 71 72 73 74 75 76 77 Height (inches) 35

Weight (pounds) When there is some linear association (0< r <1), the regression line fits as close to the points as possible and has a slope 250 225 200 175 150 67 68 69 70 71 72 73 74 75 76 77 Height (inches) 36 36

Where did this line come from? It is a straight line which is drawn through a scatterplot, to summarize the relationship between X and Y It is the line that minimizes the squared deviations (Ỹ Y) 2 We call these vertical deviations residuals 37 37

Regression Line Minimizing the squared vertical distances, or residuals 38

Regression Coefficients Table Predictor Unstandardized Coefficient Standard error t p Intercept a SE a t=a/se a Variable X b SE b t=b/se b 39 39

40 Linear Regression

Linear Regression 41 41

Correlation & Regression: Summary Both correlation and simple linear regression can be used to examine the presence of a linear relationship between two variables providing certain assumptions about the data are satisfied. The results of the analysis need to be carefully interpreted, particularly when looking for a causal relationship or when using the regression equation for prediction. 42 42