Correlation and regression

Similar documents
Statistics: revision

Can you tell the relationship between students SAT scores and their college grades?

REVIEW 8/2/2017 陈芳华东师大英语系

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Analysing data: regression and correlation S6 and S7

appstats27.notebook April 06, 2017

Correlation and Linear Regression

Chapter 27 Summary Inferences for Regression

Stat 20 Midterm 1 Review

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

appstats8.notebook October 11, 2016

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

Statistics Introductory Correlation

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Warm-up Using the given data Create a scatterplot Find the regression line

Business Statistics. Lecture 10: Course Review

Multiple Regression Analysis

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

STA Module 10 Comparing Two Proportions

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Mathematical Notation Math Introduction to Applied Statistics

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Chapter 7 Summary Scatterplots, Association, and Correlation

Chapter 22. Comparing Two Proportions 1 /29

Chapter 24. Comparing Means

Inferences for Regression

Chapter 16. Simple Linear Regression and dcorrelation

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Chapter 22. Comparing Two Proportions 1 /30

Correlation A relationship between two variables As one goes up, the other changes in a predictable way (either mostly goes up or mostly goes down)

Correlation: Relationships between Variables

Business Statistics. Lecture 9: Simple Regression

Psychology 282 Lecture #4 Outline Inferences in SLR

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

determine whether or not this relationship is.

Chapter 16. Simple Linear Regression and Correlation

Ch. 16: Correlation and Regression

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

CORELATION - Pearson-r - Spearman-rho

Correlation. January 11, 2018

Chs. 15 & 16: Correlation & Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Correlation & Simple Regression

Big Data Analysis with Apache Spark UC#BERKELEY

Review of Statistics

Chapter 4. Regression Models. Learning Objectives

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Statistics Handbook. All statistical tables were computed by the author.

Intro to Linear Regression

Two-Sample Inferential Statistics

Relationship Between Interval and/or Ratio Variables: Correlation & Regression. Sorana D. BOLBOACĂ

The Simple Linear Regression Model

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Chapter 16: Correlation

Review of Multiple Regression

PS2: Two Variable Statistics

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Introduction to Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chs. 16 & 17: Correlation & Regression

Lecture 30. DATA 8 Summer Regression Inference

Rama Nada. -Ensherah Mokheemer. 1 P a g e

AMS 7 Correlation and Regression Lecture 8

review session gov 2000 gov 2000 () review session 1 / 38

Ordinary Least Squares Regression Explained: Vartanian

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Scatter plot of data from the study. Linear Regression

Reminder: Student Instructional Rating Surveys

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS IOP 201-Q (INDUSTRIAL PSYCHOLOGICAL RESEARCH)

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Measuring Associations : Pearson s correlation

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Mathematical Notation Math Introduction to Applied Statistics

HUDM4122 Probability and Statistical Inference. February 2, 2015

The Normal Distribution. Chapter 6

Correlation 1. December 4, HMS, 2017, v1.1

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Harvard University. Rigorous Research in Engineering Education

Chapter 23: Inferences About Means

Descriptive Statistics-I. Dr Mahmoud Alhussami

psychological statistics

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

Chapter 13 Correlation

Scatter plot of data from the study. Linear Regression

Quadratic Equations Part I

Chapter 8. Linear Regression /71

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Correlation and Regression

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Transcription:

NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts: Answers to Examples 1 (from last time) Handout 2 (correlation + regression) Examples 2 (correlation + regression) Data summaries for mental rotation pobox.com/~rudolf/psychology

Have you read the Background Knowledge? Did you remember: Tables and Formulae calculator your data from the last practical (mental rotation)? If not oops!

Plan of this session: we ll cover the ideas and techniques of correlation and regression. (The handout covers everything you need to know and more.) you can analyse your own data ( have a go at the examples) you can ask questions (about this practical, the background material, or anything statistical) and we ll try to help. Afterwards, practise.

Remember: Wavy-line stuff in the handout is for reference only. You might be interested. You might refer to it in the future. You do not need to understand or learn it.

You should already know (from NST 1A or Handout 1) measures of central tendency (e.g. mean, median, mode) measures of dispersion (e.g. variance, standard deviation) histograms and distributions the logic of null hypothesis testing

You should already know (from NST 1A or Handout 1) mean sample variance sample standard deviation (SD) x x n s 2X 2 ( x ) 2 x 2 ( x x) n n 1 n 1 s X s 2X 2 ( x x) n 1 2 ( x ) 2 x n n 1

The relationship between two variables

Scatter plots show the relationship between two variables Positive correlation Negative correlation Zero correlation (and no relationship) Zero correlation (nonlinear relationship)

Correlation There are no amusing or attractive pictures to do with correlation anywhere in the world.

The covariance measures how much two variables vary together. Good name, eh? Sample covariance: cov XY ( x x )( y y ) n 1 There s another formula (below) that s easier to use if you have to calculate the covariance by hand. But you shouldn t need to if you can work your calculator, because... (next slide ) cov XY x y xy ( x x )( y y ) n n 1 n 1

r, the Pearson product moment correlation coefficient The covariance is big and positive if there is a strong positive correlation, big and negative if there is a strong negative correlation, and zero if there s no correlation. But how big is big? That depends on the standard deviations (SDs) of X and Y, which isn t very helpful. So instead we calculate r: r XY cov XY s X sy r varies from 1 to +1. Your calculator calculates r. r does not depend on which way round X and Y are.

r WORKED EXAMPLE: mental rotation Work through this one now with your calculator. We ll pause for a moment to do that... Angle Group mean RT (ms) (simplest task) 0 884.1373 60 977.2157 120 1135.157 180 1591.627 240 1112.294 300 952.6471

1800 1600 1400 1200 1000 800 600 400 200 0 RT (ms) RT (ms) r WORKED EXAMPLE: mental rotation r = 0.252 0 100 200 300 angle (degrees) Oops. Not very linear. 400 1800 1600 1400 1200 1000 800 600 400 200 0 r = 0.911 0 50 100 150 200 angle (degrees) Shortest angle (so 240 becomes 120 ; 300 becomes 60 ). Much more linear.

Never give up! Never surrender! Never forget... Zero correlation doesn t imply no relationship. So always draw a scatter plot. Correlation does not imply causation.

Correlation, causation A real-world example Male college students (18 36 years old) engaged in 5 min conversation with a stooge, who was either male (23 or 32 y.o.) or female (19 23 y.o.). Didn t know that this was part of an experiment. Saliva samples taken before and after conversation. Testosterone (T) measured. Recent sexual experience = current relationship or sex in last 6 months. Stooge rated how much they thought the subject was displaying to them (e.g. talkative, showed off, tried to impress). Roney et al. (2003) n = 19; r = 0.52; r2 = 0.27; p < 0.05

Adjusted r If we sample only 2 (x, y) points, we ll get a perfect correlation in the sample, r = 1 (or 1). But that doesn t mean that the correlation in the underlying population,, is perfect! A better estimator of is radj: 2 radj (1 r )( n 1) 1 n 2

Adjusted r: mental rotation example r 0. 911 n 6 2 radj (1 r )( n 1) 1 0. 887 n 2

Beware of sampling a restricted range

Beware outliers!

Heterogeneous subgroups: the oncologist and the magic forest

Is my correlation significant? Our first t test. Research hypothesis: the correlation in the underlying population from which the sample is drawn is non-zero ( 0). Null hypothesis: the correlation in the population is zero ( = 0). Calculate t: t n 2 r n 2 1 r 2 A big t (positive or negative) means that your data would be unlikely to be observed if the null hypothesis were true. Look up the critical level of t for n 2 degrees of freedom in the Tables and Formulae. Values of t near zero are not significant. If your t is more extreme than the critical value, it is significant. If your t is significant, you reject the null hypothesis. Otherwise, you don t. We ll explain t tests properly in the next practical.

t test: mental rotation example r 0.911 n 6 t n 2 r n 2 2 4.42 1 r critical t 4 2. 776 for 0.05 two tailed

Assumptions you make when you test hypotheses about Basically, the data shouldn t look too weird. We must assume that the variance of Y is roughly the same for all values of X. (This is termed homogeneity of variance or homoscedasticity. Its opposite is heterogeneity of variance or heteroscedasticity.) that X and Y are both normally distributed that for all values of X, the corresponding values of Y are normally distributed, and vice versa

Heteroscedasticity is a Bad Thing (and Hard to Spell) This would violate an assumption of testing hypotheses about. So if you run the t test on r, the answer may be meaningless.

rs: Spearman s correlation coefficient for ranked data Rank the X values. Rank the Y values. Correlate the X ranks with the Y ranks. (You do this in the normal way for calculating r, but you call the result rs.) To ask whether the correlation is significant, use the table of critical values of Spearman s rs in the Tables and Formulae booklet.

How to rank data Suppose we have ten measurements (e.g. test scores) and want to rank them. First, place them in ascending numerical order: 5 8 9 12 12 15 16 16 16 17 Then start assigning them ranks. When you come to a tie, give each value the mean of the ranks they re tied for for example, the 12s are tied for ranks 4 and 5, so they get the rank 4.5; the 16s are tied for ranks 7, 8, and 9, so they get the rank 8: X: rank: 5 1 8 2 9 3 12 4.5 12 4.5 15 6 16 8 16 8 16 8 17 10

Regression

Linear regression: predicting things from other things Yˆ bx a But which line? Which values of a and b?

Least squares regression: finding a and b Predicted value of Y: yˆ bx a Error in prediction (residual): y yˆ Squared error: 2 ˆ ( y y) Sum of squared errors: 2 ˆ ( y y) We pick values of a and b in such a way that minimizes the sum of the squared errors.

We ve found our best line. Yˆ bx a b cov XY s 2X sy r sx a y bx Our line will always pass through (0, a) and (x, y).

Beware extrapolating beyond the original data

Predicting Y from X is not the same as predicting X from Y!

r2 means something important the proportion of the variability in Y predictable from the variability in X

Doing it for real

shortest angle ( ) 0 60 120 180 120 60 RT (ms) 884.137 977.216 1135.157 1591.627 1112.294 952.647

RT (ms) Mental rotation: regression line (group means, simplest task) 1800 1600 1400 1200 1000 y = 3.6971x + 776.11 R2 = 0.8302 800 600 400 200 0 r = 0.911 0 50 100 150 angle (degrees) 200

11 am, 11 November: Armistice Day