Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Similar documents
Chapter 9: Statistical Inference and the Relationship between Two Variables

SIMPLE LINEAR REGRESSION

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics MINITAB - Lab 2

Introduction to Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

A Robust Method for Calculating the Correlation Coefficient

Midterm Examination. Regression and Forecasting Models

/ n ) are compared. The logic is: if the two

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

18. SIMPLE LINEAR REGRESSION III

Chapter 14 Simple Linear Regression

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Statistics for Business and Economics

28. SIMPLE LINEAR REGRESSION III

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Chapter 13: Multiple Regression

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 11: Simple Linear Regression and Correlation

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

The SAS program I used to obtain the analyses for my answers is given below.

Kernel Methods and SVMs Extension

Statistics for Economics & Business

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Scatter Plot x

Lecture 6: Introduction to Linear Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Correlation and Regression

Limited Dependent Variables

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Linear Feature Engineering 11

January Examinations 2015

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Learning Objectives for Chapter 11

e i is a random error

Mathematics Intersection of Lines

Chapter 3 Describing Data Using Numerical Measures

Comparison of Regression Lines

NUMERICAL DIFFERENTIATION

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Lecture 3: Probability Distributions

Lecture 10 Support Vector Machines. Oct

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

The Ordinary Least Squares (OLS) Estimator

Chapter 15 - Multiple Regression

Sociology 301. Bivariate Regression II: Testing Slope and Coefficient of Determination. Bivariate Regression. Calculating Expected Values

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Basically, if you have a dummy dependent variable you will be estimating a probability.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Be true to your work, your word, and your friend.

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Statistical Evaluation of WATFLOOD

Linear Approximation with Regularization and Moving Least Squares

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

STAT 511 FINAL EXAM NAME Spring 2001

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

17 - LINEAR REGRESSION II

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Statistics II Final Exam 26/6/18

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Some basic statistics and curve fitting techniques

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

This column is a continuation of our previous column

Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

Basic Business Statistics, 10/e

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

x i1 =1 for all i (the constant ).

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

β0 + β1xi. You are interested in estimating the unknown parameters β

2.3 Least-Square regressions

Lecture 3 Stat102, Spring 2007

Q1: Calculate the mean, median, sample variance, and standard deviation of 25, 40, 05, 70, 05, 40, 70.

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

3) Surrogate Responses

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Section 8.3 Polar Form of Complex Numbers

β0 + β1xi. You are interested in estimating the unknown parameters β

1 Matrix representations of canonical matrices

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

β0 + β1xi and want to estimate the unknown

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

STATISTICS QUESTIONS. Step by Step Solutions.

Activity #13: Simple Linear Regression. actgpa.sav; beer.sav;

Properties of Least Squares

Transcription:

Lnear Correlaton Many research ssues are pursued wth nonexpermental studes that seek to establsh relatonshps among or more varables E.g., correlates of ntellgence; relaton between SAT and GPA; relaton between workng memory se and speed of problem solvng Ths type of research cannot establsh pathways of cause-effect Relatonshps among varables can take on many forms Correlaton s used to establsh test relablty egatve Lnear Relaton 8 6 4 Scatter Plots 3 4 5 6 Perfect Postve Lnear Relaton 5 5 5 4 6 8 Postve Lnear Relaton Postve on-lnear Relaton 6 5 4 3 4 6 8 6 4 4 6

egatve Lnear Relaton 8 6 4 3 4 5 6 ote non-systematc devaton of ponts around best-fttng lne. Ths suggests a lnear relatonshp between and a + b ote systematc devaton of ponts around best-fttng lne. Ths suggests a non-lnear relatonshp Postve Lnear Relaton 6 5 4 3 4 6 Postve on-lnear Relaton 8 6 4-4 6 Postve on-lnear Relaton 8 6 4 4 6 a + b c, good ft Many other non-lnear relatonshps between and are also possble. E.g., a + blog; a + b, etc Postve on-lnear Relaton a + b, poor ft 8 6 4-4 6

We wll only consder lnear relatonshps There are two related descrptve ssues How strong s the relatonshp? (Chapter 6 What s the lnear correlaton between the varables? The measure we focus on s called the lnear correlaton coeffcent or the Pearson product moment correlaton Wll also consder a verson for ordnal data There are many other coeffcents of relatonshp, as well What s the relatonshp? (Chapter 7 I.e., what equaton can one use to predct from (or vce versa ote, dfferent equatons are needed to predct n each drecton Another way to phrase the strength-ofrelatonshp queston s: How well does the standard score (-score of one varable predct the standard score (-score of the other? I.e., does knowng the number of s.d. s above or below the mean one varable s tell one how many s.d. s above the mean the other varable s? Ths phrasng makes t meanngful to relate measures on dfferent scales (e.g., heght and weght or of dfferent values on the same scale (e.g., heghts of chldren and parents We want a scale that runs from + to Perfect postve to perfect negatve lnear relaton 3

To llustrate, consder the followng data: 4 3 6 4 8 5 6 7 4 8 6 9 8 Mean 5.5. S.D. 3.3 6.6 Z Z Z Z -.49 -.49. -.6 -.6.34 -.83 -.83.68 -.5 -.5.5 -.7 -.7.3.7.7.3.5.5.5.83.83.68.6.6.34.49.49. SUM 9 SUM/(- Perfect Postve Lnear Relaton 5 5 5 4 6 8 r A A, where for A and A. s A A Consder the equaton r A A, where for A and A. s A A ote that product s postve f and are on the same sde of the mean and negatve f they are on opposte sdes r f the relatonshp s perfectly postvely lnear and r- f t s perfectly negatvely lnear It s between + and otherwse Ths when the relatonshp contans random error or s non-lnear 4

5 There s another way to calculate the correlaton coeffcent s s r r Ths computatonal equaton may be easer to use Lots of algebra occurs here 4 3 6 4 8 5 6 7 4 8 6 9 8 Mean 5.5. S.D. 3.3 6.6 33 8.5 65 54 55 385 55 77 r 4 4 4 6 8 3 6 9 36 8 4 8 6 64 3 5 5 5 6 36 44 7 7 4 49 96 98 8 6 64 56 8 9 8 8 34 6 4 Sum 55 385 54 77

Another Example 3 8 7 9 9 9 9 38 36 4 49 Z Z Z Z 3 8 -.56 -.49.33 7 9 -.5 -.74.39 9 9. -.7. 9 38..58. 36.78.4.33 4 49.3.3.69 Mean 9. 9.83 S.D. 3.85 4.7 SUM 4.74 6 5 4 3 3 6 9 5 r 4.74 5.95 Proporton of Varance Accounted for r has a very smple nterpretaton n terms of the mprovement n predctablty n provded by knowledge of over that obtaned from the mean of alone In the absence of any knowledge of, one can do no better than use as the best guess for When and are lnearly related, then a+b gves a better predcted value 6 5 ote the followng: ( + ( 4 3 3 6 9 5 6

( + ( ( ( + ( Lots of algebra occurs next Gets smaller as predcton mproves Stays fxed Gets larger as predcton mproves Total varablty of accounted for by (sum of squared devatons of about ts mean Sum of squared devatons of predcted from actual scores Total varablty of (sum of squared devatons of about ts mean Coeffcent of determnaton r ( ( Some Factors Affectng r Restrctng the range of 6 data decreases the 4 correlaton Elmnatng the mddle porton of the data or addng extreme scores nflates the correlaton - 3 4 5 6 7 8 Overall r.7 For 5<<5 r.3 For < or > 64 r.89 7

Ordnal Data If the data are ordnal n nature, or the relatonshp s dstnctly non-lnear, then t s napproprate to use the Pearson Product Moment correlaton, as t assumes a lnear relatonshp and requres at least nterval-scaled data There are varous alternatves One s the Spearman correlaton, whch s the Pearson, but calculated on ranks ou can use the equaton n the book, or the pror equatons, but usng the ranks rather than the raw data For the latter, convert the scores to ranks to, convert the scores to ranks to, then proceed as we dd wth the pror examples Group se 4 8 Decson tme 5.6 9.35 4.4 7.9 8.8 umber mnutes 5.. 5.. 5.. 4 8 Group se Pearson r.9 Spearman r s. 8