Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Similar documents
Slide 7.1. Theme 7. Correlation

THE PEARSON CORRELATION COEFFICIENT

Business Statistics. Lecture 9: Simple Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Regression Analysis. BUS 735: Business Decision Making and Research

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Biostatistics: Correlations

Chapter 19: Logistic regression

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Can you tell the relationship between students SAT scores and their college grades?

Chs. 16 & 17: Correlation & Regression

Ordinary Least Squares Regression Explained: Vartanian

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Review of Multiple Regression

Using SPSS for One Way Analysis of Variance

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Correlation and regression

Business Statistics. Lecture 10: Correlation and Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

WELCOME! Lecture 13 Thommy Perlinger

9. Linear Regression and Correlation

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Ordinary Least Squares Regression Explained: Vartanian

MORE ON SIMPLE REGRESSION: OVERVIEW

SIMPLE REGRESSION ANALYSIS. Business Statistics

AMS 7 Correlation and Regression Lecture 8

determine whether or not this relationship is.

Chapter 16. Simple Linear Regression and Correlation

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Applied Regression Analysis

Statistics Introductory Correlation

Regression Analysis: Basic Concepts

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Module 8: Linear Regression. The Applied Research Center

Ch. 1: Data and Distributions

appstats27.notebook April 06, 2017

r2, the coefficient of determination The bivariate normal assumption Diagnostic plots: Residuals and Cook's Distance R output (moved to week 3),

Important note: Transcripts are not substitutes for textbook assignments. 1

Independent Samples ANOVA

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 27 Summary Inferences for Regression

Upon completion of this chapter, you should be able to:

CS 5014: Research Methods in Computer Science

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

14: Correlation. Introduction Scatter Plot The Correlational Coefficient Hypothesis Test Assumptions An Additional Example

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Inferences for Regression

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

df=degrees of freedom = n - 1

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Lecture 16 - Correlation and Regression

Finding Relationships Among Variables

1 A Review of Correlation and Regression

Single and multiple linear regression analysis

Residuals, Coefficient of Determination Worksheet Solutions

CS 147: Computer Systems Performance Analysis

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Exploratory Factor Analysis and Principal Component Analysis

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Review of Statistics 101

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Bivariate Relationships Between Variables

Business Statistics. Lecture 10: Course Review

Describing the Relationship between Two Variables

Chs. 15 & 16: Correlation & Regression

Multiple Regression Analysis

Simple Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

1 Correlation and Inference from Regression

Statistics and Quantitative Analysis U4320. Lecture 13: Explaining Variation Prof. Sharyn O Halloran

Correlation. January 11, 2018

Statistics and Quantitative Analysis U4320

Regression Analysis II

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

11 Correlation and Regression

Simple Linear Regression: One Quantitative IV

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Ch. 16: Correlation and Regression

Relationships between variables. Association Examples: Smoking is associated with heart disease. Weight is associated with height.

Lectures on Simple Linear Regression Stat 431, Summer 2012

appstats8.notebook October 11, 2016

Psychology 282 Lecture #4 Outline Inferences in SLR

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY CORRELATION OF EXPERIMENTAL DATA

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

HUDM4122 Probability and Statistical Inference. February 2, 2015

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Analysis of Variance (ANOVA)

What is a Hypothesis?

Transcription:

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Last time, we looked at scatterplots, which show the interaction between two variables, and correlation.

The correlation coefficient r measures how well the pairs of values fit on a line. r is positive when two values increase together. r is negative when two one value goes up as the other goes down.

However, correlation only shows the linear relation between two variables. The variables could still be related in a non-linear way and have little or no correlation.

In real world contexts, the most common form of non-linear relationship is a curvilinear one. (SOURCE: GAPMINDER.ORG) One common reason is a scaling issue, where a fixed change in one thing doesn t mean a fixed change in another.

Life expectancy increases with the logarithm of income, not with income. (SOURCE: GAPMINDER.ORG) When we rescale income into a log-scale (a scale that shows very small and very large numbers equally well), a line appears.

Another reason for non-linearity could be two competing factors. In a too-easy course, nobody learns anything new. In a too-hard course, nobody learns anything at all.

Spearman correlation is a measure that can handle curves as long as the trend doesn t switch between increasing and decreasing. The only time we ll be using this is as a check in SPSS. Everything else we do in Ch.10 and 11 is the Pearson correlation, which is restricted to linear relationships. We use the Pearson correlation because it produces stronger results and the math is simpler.

Math: The ugly sweater around an otherwise pretty graph.

You can do hypothesis testing. We may be interested in whether or not there is a correlation between two variables. Since samples are random, the sample correlation between two variables will show up as a little above or below zero by chance. How far from zero correlation does something have to be before it s significant?

This formula gives the t-score of correlation. The null hypothesis is: true correlation = zero. The alternative is: correlation not zero.

The t in this formula is the same t-score as in chapters 6 and 7. This t-score gets compared the critical values in the t-table at n-2 degrees of freedom.

The stronger the correlation, the farther r goes from zero. As r gets farther from zero, t-score gets bigger. So a stronger correlation gives you higher t-score. Stronger correlation better evidence of a correlation.

t-score also increases with sample size. As usual, it s under a square root. Having more data points makes it easier to detect correlations.

A larger t-score meant more evidence against the null, just like before. So a large t-score means more evidence of a correlation.

If there s a weak correlation and a small sample, we might not detect it. (Example: n=10, r=.25)

t* = 1.397, at 8 df, 0.20 significance. t* = 2.306, at 8 df, 0.05 significance. No significant evidence of a correlation. p > 0.20

What if we get a larger sample of this correlation? (n=46, r=0.25) We should get some evidence of a correlation, but not much.

t* = 1.684, at 44 df, 0.10 significance. t* = 2.021, at 44 df, 0.05 significance. Weak evidence of a correlation, 0.10 < p < 0.05.

What happens when you get a near perfect correlation? (Example: n=10, r=.99). Expectation: Very strong evidence of a correlation.

t* = 2.306, at 8 df, 0.05 significance. t* = 5.041, at 8 df, 0.001 significance. Reality: Very strong evidence of a correlation.

The bottom gets very small, and dividing by a small number gives you something huge. The same thing happens with a near-perfect negative correlation, but the t-score is negative and huge.

For interest: You can always put a line exactly through two points. With only two points, we have no idea what the true correlation is. Points after the first two tell us about correlation. That s why correlation has n-2 degrees of freedom.

More math? More ugly sweaters! Show your pet some love by forcing it into a tea cosy.

First, we need to set down a convention. We re looking at two variables of the same object. We call these variables x and y. Example: If we were talking about dragons, X could be the length and Y could be the width. X is the independent/explanatory variable (the one we control or can measure more perfectly), Y is the dependent/response variable.

When x and y are correlated, we say that some of the variation in y is explained by x. Meaning: Across all the x, the range of y can be large.

But if we only consider a particular x (or a small x-interval), the range of y shrinks considerably. Y varies less for a particular X. Y has less variance when accounting for X.

r 2 is the proportion that the variance of y is reduced when accounting for x. r = 0.6 in this graph, so r 2 = 0.6 2 = 0.36. 36% of the variation in Y is explained by X.

The same proportion of variance is explained for a negative correlation of equal strength. A negative times itself is positive, so r 2 is always between 0 and 1.

In a perfect correlation, knowing x automatically gives you y as well. So there is no variation in y left to explain. r = 1 or -1, so r 2 = 1. All of the variation in y is explained by x. When two values are uncorrelated, using a linear function of x to guess at y is useless. r = 0, so r 2 = 0 None of the variation in y is explained by x.

The total squared difference from the mean of y is called the sum of squares total, or SST SST is the total square length of all the vertical red lines.

If we fit a line through the middle of the points in the scatter plot (called a regression line, the subject of chapter 11), the lines, on average, get shorter. The total squared length of these lines is the sum of squared error, or SSE.

The stronger the correlation, the shorter the vertical lines get. In other words, the smaller our errors get, and with them the Sum of Squared Error does too. Here, the correlation is very strong, and there are barely and errors at all.

r 2 can also be expressed in terms of SSE and SST. SST is the total amount of variation in Y SSE is the amount of variation in Y left unexplained by X. When r 2 is zero, SSE is same as SST When r 2 is one, SSE disappears completely.

An ugly sweater for every occasion! Even SPSS!

To find a correlation in SPSS, go to Analyze Correlate Bivariate (Means two-variable)

Pick the variables you want to correlate, drag them right. Pearson correlation coefficient MUST be selected. Spearman coefficient is optional.

There is a correlation of r =.940 between weight and height. It s a significant correlation, with a p-value of less than.001 (shows up as Sig. (2-tailed) =.000) Also, anything correlates with itself perfectly, so the correlation between length and length is r= 1

To build a scatterplot, go to graphs legacy dialogs Scatter/Dot

Choose Simple Scatter if it s not already picked, and click Define.

Move the independent variable into the x-axis, And the dependent variable into the y-axis,, then click OK (way at the bottom)

Our result: There is a definite upward trend, so the strong positive correlation of r = 0.940 makes sense.

Next time: Residuals, Outliers and Influence, and the assumption of constant variance.