Statistics Introductory Correlation

Similar documents
Can you tell the relationship between students SAT scores and their college grades?

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation 1. December 4, HMS, 2017, v1.1

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

CORELATION - Pearson-r - Spearman-rho

Correlation: Relationships between Variables

Measuring Associations : Pearson s correlation

Inferences for Correlation

Chapter 16: Correlation

Bivariate Relationships Between Variables

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Reminder: Student Instructional Rating Surveys

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015

Correlation and Linear Regression

Chapter Eight: Assessment of Relationships 1/42

Linear Correlation and Regression Analysis

Business Statistics. Lecture 10: Correlation and Linear Regression

Upon completion of this chapter, you should be able to:

Sampling Distributions: Central Limit Theorem

Biostatistics: Correlations

REVIEW 8/2/2017 陈芳华东师大英语系

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

AMS 7 Correlation and Regression Lecture 8

Bivariate statistics: correlation

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

Correlation and the Analysis of Variance Approach to Simple Linear Regression

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Chs. 16 & 17: Correlation & Regression

Review of Statistics

Ch. 16: Correlation and Regression

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

ANOVA CIVL 7012/8012

2 Regression Analysis

Correlation and regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Chs. 15 & 16: Correlation & Regression

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Chapter 12 - Lecture 2 Inferences about regression coefficient

Psychology 282 Lecture #4 Outline Inferences in SLR

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

LOOKING FOR RELATIONSHIPS

Introduction and Single Predictor Regression. Correlation

Hypothesis testing: Steps

Finding Relationships Among Variables

determine whether or not this relationship is.

y n 1 ( x i x )( y y i n 1 i y 2

Slide 7.1. Theme 7. Correlation

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Exam details. Final Review Session. Things to Review

Statistics in medicine

Hypothesis testing: Steps

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

THE PEARSON CORRELATION COEFFICIENT

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Review of Statistics 101

Calculating Fobt for all possible combinations of variances for each sample Calculating the probability of (F) for each different value of Fobt

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to

Chapter 13 Correlation

Statistics: revision

Hypothesis Testing hypothesis testing approach

Chi-Square. Heibatollah Baghi, and Mastee Badii

This gives us an upper and lower bound that capture our population mean.

Scatter plot of data from the study. Linear Regression

Lecture 5: ANOVA and Correlation

Simple Linear Regression

Ordinary Least Squares Regression Explained: Vartanian

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

psychological statistics

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Understand the difference between symmetric and asymmetric measures

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression

Statistics Handbook. All statistical tables were computed by the author.

Psych 230. Psychological Measurement and Statistics

9 Correlation and Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Introduction to the Analysis of Variance (ANOVA)

Course Review. Kin 304W Week 14: April 9, 2013

Scatter plot of data from the study. Linear Regression

N Utilization of Nursing Research in Advanced Practice, Summer 2008

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Appendix A. Review of Basic Mathematical Operations. 22Introduction

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Inferences for Regression

Chapter 16. Simple Linear Regression and Correlation

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Chapter 16. Simple Linear Regression and dcorrelation

Transcription:

Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018

Outline 1

Statistics are not used only to describe central tendency and variability for a single variable. Rather, statistics can be used to describe relationships between variables.

A correlation exists when changes in one dependent variable are statistically associated with systematic changes in another variable hence, a correlation is a type of bivariate relationship. Objective of this lecture This lecture introduces methods for measuring and describing the strength of the relationship between quantitative variables.

Examples of relationships between variables appear in the table below. Can you identify which examples show a relationship between X and Y and which do not?

1. Ex 1. Relationship. As values of X increase, values of Y increase. 2. Ex 2 also shows a relationship. as values of X decrease, values of Y increase. Even though scores for the two variables are heading in opposite directions, there is still a systematic change in Y that corresponds to the changes in X. 3. Ex Curvilinear relationship. Notice as X increases from 3 to 5 to 7, Y increases from 2 to 4 to 6, but the values of Y begin to decrease from 6 to 4 to 2 as X continues to increase. This is also a relationship, because as values of X increase, there is a systematic change in Y, but then the values of Y decrease. 4. Ex 4 and 5 do not show relationships between X and Y.

Scatterplots Determining whether a relationship is present between two dependent variables can be difficult looking at x-y data pairs. So, to see relationships most people begin by creating a scatterplot. Examples: say we measure n = 20 people on the following variables: Family Guy Watching: the number of Family Guy episodes a person watches per week Intelligence, as measured by an IQ test.

Scatterplots Examples: say we measure n = 20 people on the following variables:

Scatterplots You can see from the data as the number of Family Guy (X) episodes watched increases, Intelligence scores (Y) also increase. Scatterplot are used to display whether a relationship between two variables is positive linear, negative Statistics linear, Intermediate curvilinear, or absent.

Scatterplots A positive linear relationship is observed when the values of both variables have a trend that occurs in the same direction.

Scatterplots A negative linear relationship is observed when the values of variables have trends that occur in opposite directions (inverse relationship). That is, as the values of one variable increase the values of the other variable tend to decrease.

Scatterplots There are many forms of a curvilinear relationship, but generally, such a relationship exists whenever there is a change in the relationship between variables.

Scatterplots A relationship is absent whenever there is no systematic change between variables.

Pearson Correlation Coefficient The statistic most commonly used to measures the correlation between two quantitative variables is the Pearson Produce-Moment Correlation Coefficient or, more succinctly, the Pearson correlation (r or rxy). What for The Pearson correlation is used to measure the strength and direction of a linear relationship between two quantitative variables. However it has to meet some conditions

Pearson Correlation Coefficient However it has to meet some conditions 1 The variables must be quantitative; variables cannot be categorical (nominal) or ordinal (interval or ratio scale) the Spearman correlation is used to measure the correlation when at least one variable is ordinal. Chi square analyses are used if the data comes from nominal scales 2 Each variable must produce a wide range of its potential value: If a limited or restricted range of potential values are measured you may not observe the true relationship. 3 The relationship is not curvilinear. The Pearson correlation measures the direction and degree of two variables with a linear relationship.

Pearson Correlation Coefficient Some characteristics 1 measures the degree of linearity between variables. 2 has a range between -1.00 to +1.00. 3 The closer to -1.00 or +1.00, the more linear the relationship is between the variables 4 If Pearson correlation is found to be equal to -1.00 or +1.00, this indicates that there is a perfect linear relationship between variables 5 The sign (+/-) of the Pearson correlation tells you whether the relationship is positive-linear or negative-linear 6 the sign says nothing about the strength of the linear relationship 7 A zero correlation is a case in which the Pearson correlation is equal to zero (r = 0). In this case there is no relationship between variables.

Calculating the Pearson Correlation (r) The Pearson correlation is defined as standardized covariance between two quantitative variables. Recall from standard scores that when calculate a z-score, you divide the difference between a raw score and the mean of a distribution by the standard deviation of the distribution. A measure referred to as covariance is divided by the product of two standard deviations; hence, this measure of covariance is standardized. So what is covariance?

Before introducing covariance, let me introduce a set of data that will be used to calculate the Pearson correlation.

Before introducing covariance, let me introduce a set of data that will be used to calculate the Pearson correlation. Below is a set of data for n = 10 people. In this hypothetical set of data, assume I measured the age (X) of these ten people and measured the number of books that each of these 10 people read per month (Y).

Variance is the average variability among scores for a single variable. You can examine the scores for the variable age (X) and for the variable Books Read per Month (Y) in the table to the left and see that those scores vary; this variance. Covariance is the average co-variation of scores between variables, that is, the average amount by which two variables are changing together. The difference between variance and covariance covariance is the average variation between scores from two variables; variance is the average variation among scores of a single variable

Variance covariance is the average variation between scores from two variables; variance is the average variation among scores of a single variable Covariance cov = ŝ 2 = Σ(X X )2 n 1 Σ[(X X )(Y Y ) n 1 To calculate covariance you divide the numerator by n - 1, rather than by n, because we are estimating the covariance in a population from sample data.

Covariance is formally defined as the average sum of the cross produces between two variables. The sum of the cross products (SCP) is the sum of the products of the mean centered scores from each variable; SCP = Σ[(X X )(Y Y ) Calculating sum of cross products is similar to calculating sum of squares.

Importantly: unlike sum of squares, which is always positive, the sum of cross products can be positive or negative: If SCP is positive, covariance will be positive and the correlation is positive-linear. If SCP is negative, covariance will be negative and the correlation is negative-linear If SCP = 0, it indicates a zero relationship. SCP = Σ[(X X )(Y Y )

The next step is to calculate the covariance. Conceptually, all that you need to calculate covariance is to divide SCP from above by n - 1: cov = Σ[(X X )(Y Y ) n 1 = SCP n 1 = 43 10 1 = 4.77 The next step to calculate a Pearson correlation is to standardize the covariance. The formula for the Pearson correlation is: r = cov ŝ X ŝ Y The denominator is the product of the estimated standard deviation of each variable

The denominator is the product of the estimated standard deviation of each variable The numerator is covariance (cov xy = 4.778).

The denominator is the product of the estimated standard deviation of each variable The estimated of the standard deviation for each variable are: SSX 647 ŝ X = n 1 = 10 1 = 8.479 SSY 26 ŝ Y = n 1 = 10 1 = 1.7

The numerator is covariance (cov xy = 4.778). The estimated of the standard deviation for each variable are: SSX 647 ŝ X = n 1 = 10 1 = 8.479 ŝ Y = SSY n 1 = 26 10 1 = 1.7 Now, we have all three pieces needed to calculate the Pearson correlation: r = cov ŝ X ŝ Y

The estimated of the standard deviation for each variable are: SSX 647 ŝ X = n 1 = 10 1 = 8.479 ŝ Y = SSY n 1 = 26 10 1 = 1.7 Now, we have all three pieces needed to calculate the Pearson correlation: r = cov ŝ X ŝ Y = 4.778 (8.479)(1.7) = 4.778 14.41 = 0.331 The question is what a correlation coefficient of this size indicates.

r = cov ŝ X ŝ Y = 4.778 (8.479)(1.7) = 4.778 14.41 = 0.331 The question is what a correlation coefficient of this size indicates. the larger the absolute value of the Pearson correlation the better. That is, the closer to +1.00 or to -1.00 a correlation of r =.33 is generally large in the behavioral sciences, because there is so much variation in behavior.

Proportion of Explained Variance and Residual (Unexplained) Variance The coefficient of determination r 2 is calculated by squaring the Pearson correlation. This value is the proportion of variance that is accounted for (explained) in the relationship between the two variables. In the case of a positive linear relationship, the coefficient of determination is the proportion of X scores that increase with the Y scores. In the case of a negative linear relationship, the coefficient of determination is the proportion of X scores that decrease as Y increases. You can think of r2 as the proportion of scores that are correctly predicted by the relationship. IMPORTANT!!!! the coefficient of determination can never be negative and has a range of 0 to 1, In the the coefficient Oscar of determination BARRERA Statistics is rintermediate 2 = 0.3312 = 0.11

Proportion of Explained Variance and Residual (Unexplained) Variance Residual variance is calculated by subtracting the coefficient of determination (r2) from 1(1 r 2 ). Residual variance is the proportion of variance between two variables that is not accounted for in the relationship; it s the proportion of X-Y scores that do not co-vary together in the direction indicated by the relationship IMPORTANT!!!! the coefficient can never be negative and has a range of 0 to 1, In the the coefficient of determination is (1 r 2 ) = 1 0.11 = 0.89.

Characteristics of the Pearson Correlation having a correlation between two variables does not mean one variable caused the variation in the other variable: Correlation does not mean causation!

Characteristics of the Pearson Correlation having a correlation between two variables does not mean one variable caused the variation in the other variable: Correlation does not mean causation!

Characteristics of the Pearson Correlation having a correlation between two variables does not mean one variable caused the variation in the other variable: Correlation does not mean causation! The only way to determine whether changes in one variable cause changes in another variable is by manipulating an independent variable and conducting Oscaran BARRERA experiment.

Statistical significance of a Pearson correlation Correlation depends on whether the correlation under the null hypothesis is assumed to be zero or some non-zero value. When the correlation under the null hypothesis is assumed to be zero you use a type of t-test to assess the significance of r. When under the null hypothesis r is assumed to be a value other than zero, use Fishers z-test to assess significance. Recall, the value of a correlation has a range of -1.00 to +1.00 and a zero-correlation (r = 0) indicates no association between variables. The symbol for the Pearson correlation in a population is the Greek lowercase rho (ρ). Thus, the null and alternate hypotheses predict that: H 0 : ρ = 0H 1 : ρ 0

Statistical significance of a Pearson correlation H 0 : ρ = 0H 1 : ρ 0 Notice: the alternate hypothesis is not saying whether the correlation will be positive-linear or negative-linear. If the alternate hypothesis predicts the correlation will be positive-linear the hypotheses are: H 0 : ρ = 0H 1 : ρ > 0 If the alternate H1 predicts the correlation will be negative-linear: H 0 : ρ = 0H 1 : ρ < 0

Determining Statistical Significance Say you are interested in whether the Pearson correlation between age and book reading behavior from the earlier sections is statistically significant. H 0 : ρ a,b = 0H 1 : ρ a,b 0 Recall, from that example, n = 10 and r = 0.331. To determine whether this correlation is statistically significant, we use the following t-test: t = r (1 r 2 )(n 2)

Determining Statistical Significance t = r (1 r 2 )/(n 2) BEFORE THE CALCULATIONS IMPORTANT!!: note this t-test is used only when ρ is predicted to be zero under the null hypothesis. we ll select an alpha of α =.05 for a non-directional alternate hypothesis. for the Pearson correlation degrees of freedom are equal to n - 2 (we need to account for the degrees of freedom in each dependent variable). (df= 10 2 = 8) t = 0.331 (1 0.331 2 )/(10 2) = 0.331 (1 0.11)/(8) =

Determining Statistical Significance t = t = r (1 r 2 )/(n 2) 0.331 = 0.3311 (0.89)/(8) 0.333 = 0.934 This is the test statistic that we use to assess the statistical significance of the Pearson correlation. Look up in the table At 95 of confidence with df= 8, the t=2,3060 given that my t is lower than the t 0.5 I can t reject the null hypothesis.

Power and Effect size for Pearson Correlation The effect size of the Pearson correlation is the absolute value of the Pearson correlation.

Power and Effect size for Pearson Correlation

Power and Effect size for Pearson Correlation

Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018