Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Similar documents
Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Chapter 11: Simple Linear Regression and Correlation

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics for Economics & Business

Chapter 9: Statistical Inference and the Relationship between Two Variables

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Economics 130. Lecture 4 Simple Linear Regression Continued

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Basic Business Statistics, 10/e

Chapter 14 Simple Linear Regression

Statistics MINITAB - Lab 2

Scatter Plot x

First Year Examination Department of Statistics, University of Florida

Statistics for Business and Economics

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

/ n ) are compared. The logic is: if the two

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

x = , so that calculated

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Lecture 6: Introduction to Linear Regression

Chapter 13: Multiple Regression

Learning Objectives for Chapter 11

STAT 511 FINAL EXAM NAME Spring 2001

Lecture 4 Hypothesis Testing

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Statistics II Final Exam 26/6/18

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

SIMPLE LINEAR REGRESSION

Introduction to Regression

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

STATISTICS QUESTIONS. Step by Step Solutions.

Modeling and Simulation NETW 707

Statistics Chapter 4

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Lecture 3 Stat102, Spring 2007

January Examinations 2015

The Ordinary Least Squares (OLS) Estimator

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Properties of Least Squares

x i1 =1 for all i (the constant ).

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Tests of Exclusion Restrictions on Regression Coefficients: Formulation and Interpretation

17 - LINEAR REGRESSION II

STAT 3008 Applied Regression Analysis

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Limited Dependent Variables

Linear Regression Analysis: Terminology and Notation

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Lecture 3: Probability Distributions

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

F statistic = s2 1 s 2 ( F for Fisher )

Chapter 3 Describing Data Using Numerical Measures

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Joint Statistical Meetings - Biopharmaceutical Section

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Unit 10: Simple Linear Regression and Correlation

Topic- 11 The Analysis of Variance

Chapter 15 Student Lecture Notes 15-1

18. SIMPLE LINEAR REGRESSION III

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

More metrics on cartesian products

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

28. SIMPLE LINEAR REGRESSION III

Polynomial Regression Models

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Negative Binomial Regression

β0 + β1xi. You are interested in estimating the unknown parameters β

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

e i is a random error

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Multiple Choice. Choose the one that best completes the statement or answers the question.

Comparison of Regression Lines

PROBABILITY PRIMER. Exercise Solutions

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

Lecture 6 More on Complete Randomized Block Design (RBD)

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Biostatistics 360 F&t Tests and Intervals in Regression 1

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Diagnostics in Poisson Regression. Models - Residual Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Transcription:

Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton, we are gong to look at a related concept, correlaton, whch goes beyond regresson, and s a measurement of how strong of a lnear relatonshp there s between two random varables, X and Y. The Sample Correlaton Coeffcent r Gven n pars of observatons,( x, y ),( x, y ),,( x, y ) 1 1, f y s large when x s large, then they have a strong postve relatonshp. If y s small when x s large, then they have a strong negatve relatonshp. Consder then n n Here s the ratonale: If X and y have a strong postve relatonshp to one another, then ( x x) wll tend to be postve when ( y y) s also postve (when the value of x s above ts mean, the value of y wll

lkewse be above ts mean); On the flp sde when ( x x) s negatve ( y y) wll also be negatve (so when the x s fall below the mean of x, the y s lkewse wll fall below the mean of y). Ether way,, Sxy = ( x x)( y y) > 0 suggestng a strong postve lnear relatonshp. On the other hand, f the are strongly negatvely related, when ( x x) <0, typcally ( y y) >0, and vce versa, suggestng that S = ( x x)( y y) < 0 wll go. xy So Sxy = ( x x)( y y) s a way to measure how strong a lnear relatonshp between two random varables s, but there s one flaw: If the unts of measurement are small, Sxy s small, and f the unts of measurement are bg, Sxy wll be bg. So, we want to come up wth a way to re-scale ths so that we can objectvely decde whether the relatonshp s strong or weak, regardless of the unts of measurement. Instead of usng Sxy we use r s called the sample correlaton coeffcent Propertes of r 1. The value of r does not depend on whch of the two varables under study s labeled x and whch s labeled y. The value of r s ndependent of the unts n whch x and y are measured

3. 1 r 1 4. r = 1 ff all (x, y) pars le on a straght lne wth postve slope and -1 ff all (x,y) pars le on a straght lne wth negatve slope. 5. The square of the sample correlatons coeffcent gves the value of the coeffcent of determnaton that would result from fttng the smple lnear regresson model. In other words, the r s are the same: r = sample correlaton coeffcent; r = the coeffcent of determnaton. When s correlaton strong/weak? Weak: 0 r 5 Moderate:.5 < r <.8 Strong: 8. r 1 Example 1: Page 49 #59 Toughness and fbrousness of asparagus are major determnants of qualty. Ths was the focus of a study reported n the Journal of the Amercan Socety of Hortcultural Scence. The artcle reported the accompanyng data on x = shear force (n kg) and y = percent fber dry weght. X 46 48 55 57 60 7 81 85 94 Y.18.1.13.8.34.53.8.6.63 X 109 11 13 137 148 149 184 185 187 Y.5.66.79.8 3.01.98 3.34 3.49 3.6

n= x = x = 18, 1950, 51,970 y y xy = 47.9, = 130.6074, = 5530.9 a. Calculate the value of the sample correlaton coeffcent. Based on ths value, how would you descrbe the nature of the relatonshp between the two varables? b. If a frst specmen has a larger value of shear force than does a second specmen, what tends to be true of the percent dry fber weght for the two specmens? Also bgger c. If shear force s expressed n pounds, what happens to the value of r? Why? No change, purpose of usng correlaton coeffcent; not affected by unt change d. If the smple lnear regresson model were to ft to ths data, what proporton of observed varaton n percent fber dry weght could be explaned by the model relatonshp.

The Populaton Correlaton Coeffcent ρ and Inferences about Correlaton The correlaton coeffcent r s a measure of how strongly related x and y are n the observed sample. We can thnk of the pars ( x1, y1),( x, y),, ( xn, yn) as beng drawn from a bvarate populaton of pars, wth (X,Y) havng jont pdf f(x,y). Ths goes back to the defnton of correlaton coeffcent n chapter 5 Cov( X, Y ) ρ = ρ( XY, ) = σ σ X Y ( ) = ( µ x)( µ y) ( ) Cov X, Y x y p x, y ( x, y dscrete) x y ( µ )( µ ) ( ) = x y f x, y dx dy ( x, y contnuous) x y The formula for populaton correlaton coeffcent looks very smlar to the formula we just developed for the sample correlaton coeffcent. Thus r s an estmator for ρ. ˆ ρ = r = We are gong to make nferences about r, so we wll need to dscuss a new dstrbuton, the Bvarate Dstrbuton: The jont pdf of (X, Y) s specfed by

We only need to know some of ts propertes: 1. It s a three-dmensonal bell shaped curve that les entrely above the xy plane. If we slce through the surface wth any plane perpendcular to the xy plane, you wll get a normal dstrbuton. a. =In regresson, all of the work we dd was based on x beng fxed n advance, and only y was random; that corresponds to slcng through the graph of the bvarate dstrbuton through the plane X=x; Ths would lead to the condtonal dstrbuton of Y wth X=x and would gve mean Ths mples that f the ordered pars (x, y) are drawn from a bvarate normal dstrbuton, then the smple lnear regresson model s an approprate way of studyng the behavor of Y for fxed x. 3. Assumng that the pars are drawn from a bvarate normal dstrbuton allows us to test hypotheses about and to construct CIs. The tests we wll learn cannot be done when n s small

4. Bvarate normalty mples that the margnal dstrbutons of both X and Y are normal. 5. There s no real way to test for bnormalty, other than dong normal probablty plots for x and y ndvdually. If ether s far off from beng normal, you cannot use the bvarate dstrbuton. Testng for Absense of Correlaton When H : 0 0 ρ = s true, then the test statstc T = R n 1 R Has a t dstrbuton wth n-degrees of freedom. Alteratve hypotheses: When the null hypothess, H : 0 0 ρ = s true, ths means that there s no lnear relatonshp between the two random varables. In secton 1.3, we dd a hypothess test usng H0 : β 1 = 0as the null hypothess wll test statstc T ˆ β =. If the null was true, then the slope of the regresson lne 1 s ˆ β 1 would be 0, hence the data represents a horzontal lne, wth no dependence on the varable x at all (n other words, the two varables are

ndependent). Ths s actually the same test as the one above because R n ˆ β1 T = = 1 R s ˆ 1 Example 1 contnued β Toughness and fbrousness of asparagus are major determnants of qualty. Ths was the focus of a study reported n the Journal of the Amercan Socety of Hortcultural Scence. The artcle reported the accompanyng data on x = shear force (n kg) and y = percent fber dry weght. n= x = x = 18, 1950, 51,970 y = 47.9, y = 130.6074, xy = 5530.9 e. Carry out a test at sgnfcance level.01 to decde whether there s a postve lnear assocaton between the two varables Other Inferences concernng To test H0 : ρ = ρ0 (not 0), we must use a transformed random varable, called the Fsher transformaton. When ( X, Y ), ( X, Y ) s a sample from a bvarate normal dstrbuton, the RV 1 1 n n

V 1 1+ R = ln 1 R Has approxmately a normal dstrbuton wth mean and varance µ V 1 1+ ρ = ln 1 ρ and 1 σ V = n 3 Ths cannot be used f n s small Hypothess tests of the form H0 : ρ = ρ0 Test Statstc: Alternatve: Rejecton Regon Confdence ntervals for Just as we dd before, we obtan a CI by begnnng wth our RV, V n ths case, standardze t, and use the normal table to help us fnd the value of

Example : The artcle Increases n Sterod Bndng Globulns Induced by Tamoxfen n Patents wth Carcnoma of the breast reports data on

the effects of the drug tamoxfen on change n the level of cortsolbndng globuln (CBG) of patents durng treatment. Wth age =x and change n CBG = y summary values are a. Compute a 90% CI for the true correlaton coeffcent

b. Test H : ρ =.5 0 vs. H : ρ <.5 a c. In a regresson analyss of y on x, what proporton of cortsonebndng globuln level could be explaned by varaton n patent age wthn the sample? d. If you decde to perform regresson analyss wth age as the dependent varable what proporton of varaton n age s explanable by varaton n the change n CBG?