Inferences for Correlation

Similar documents
Can you tell the relationship between students SAT scores and their college grades?

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Statistics Introductory Correlation

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Linear Correlation and Regression Analysis

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

9. Linear Regression and Correlation

Inferences About Two Proportions

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Correlation Analysis

Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research

Psychology 282 Lecture #4 Outline Inferences in SLR

Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression

Chapter 16. Simple Linear Regression and dcorrelation

Describing Bivariate Data

Chapter 16. Simple Linear Regression and Correlation

Hypothesis testing: Steps

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

PSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter Eight: Assessment of Relationships 1/42

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Sampling Distributions: Central Limit Theorem

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Hypothesis testing: Steps

Inferential Statistics

INTERVAL ESTIMATION AND HYPOTHESES TESTING

11 Correlation and Regression

Midterm 2 - Solutions

Part III: Unstructured Data

Correlation and Regression (Excel 2007)

MATH 240. Chapter 8 Outlines of Hypothesis Tests

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

An inferential procedure to use sample data to understand a population Procedures

Econometrics. 4) Statistical inference

CBA4 is live in practice mode this week exam mode from Saturday!

Homework 2: Simple Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

LI EAR REGRESSIO A D CORRELATIO

First we look at some terms to be used in this section.

Spearman Rho Correlation

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

Chapter 12 - Lecture 2 Inferences about regression coefficient

Review of Statistics 101

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Ch. 16: Correlation and Regression

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Two-Sample Inferential Statistics

This document contains 3 sets of practice problems.

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Visual interpretation with normal approximation

Ordinary Least Squares Regression Explained: Vartanian

Econ 325: Introduction to Empirical Economics

Multiple Linear Regression

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

Correlation Examining the relationship between interval-ratio variables

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Notes 6: Correlation

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

Fundamental Probability and Statistics

STT 843 Key to Homework 1 Spring 2018

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

The t-statistic. Student s t Test

Chs. 15 & 16: Correlation & Regression

Final Exam - Solutions

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

14: Correlation. Introduction Scatter Plot The Correlational Coefficient Hypothesis Test Assumptions An Additional Example

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Introduction to Statistics for the Social Sciences Review for Exam 4 Homework Assignment 27

Chapter 7 Comparison of two independent samples

STA 101 Final Review

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Difference between means - t-test /25

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Chs. 16 & 17: Correlation & Regression

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Chapter 9. Correlation and Regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Upon completion of this chapter, you should be able to:

Transcription:

Inferences for Correlation Quantitative Methods II Plan for Today Recall: correlation coefficient Bivariate normal distributions Hypotheses testing for population correlation Confidence intervals for population correlation 1

Bivariate Analysis Is there a relationship between two variables? For example, is there a relationship between a person s income and his level of educational attainment? These types of questions are studied by the bivariate analysis, where bi indicates two variables. Statisticians also work with more than two variables, leading to a multivariate analysis. Correlation Coefficient r It was developed by Karl Pearson in the early 1900s as a numerical measure of strength and direction of the linear association between the independent variable x and the dependent variable y. The value of r is always between 1 and 1. If r > 0, the correlation is positive (when x increases, y increases as well). If r < 0, the correlation is negative (when x increases, y decreases). 2

Correlation Coefficient r r 1 : perfect negative correlation 1 < r < 0.6 : strong negative correlation 0.6 < r < 0.3 : moderate negative correlation 0.3 < r < 0 : weak negative correlation r 0 : no correlation 0 < r < 0.3 : weak positive correlation 0.3 < r < 0.6 : moderate positive correlation 0.6 < r < 1 : strong positive correlation r 1 : perfect positive correlation A formula for the coefficient of correlation r = s x s y b where s x and s y are the standard deviations for the x- and y- data respectively, and the slope b = xy x ҧ തy x 2 xҧ 2 However, the correlation coefficient is much faster and easier to compute using your scientific calculator! 3

Salaries and education The table on the left represents a sample of 11 individuals working for the government of Quebec, their annual salary in thousands of $, and their educational attainment, in years. Computing the correlation coefficient Using the formulas from before or the built-in calculator functions, compute: r = 0. 6395 This is an example of a strong positive correlation. 4

Bivariate normal distribution We shall always assume that the set (x, y) of ordered pairs of data comes from a bivariate normal distribution. It means that for a fixed value of x, the values of y are normally distributed, and for a fixed value of y the values of x are normally distributed as well. In most cases, the results are still accurate if the distributions are bell-shaped and symmetrical, and the y-variances are approximately equal. Hypothesis testing for correlation. The population correlation is denoted by the Greek letter ρ ( rho ) and the sample correlation by r. The null-hypothesis is always going to be that the values of x and y have no linear correlation, that is H 0 : ρ = 0. The alternate hypotheses will always be H A : ρ 0 (two-tailed test) 5

The test statistic To test hypotheses for a population correlation, we are going to use the Student s t distribution with (n-2) degrees of freedom: df = n 2 And the test statistic t is given by r n 2 t = 1 r 2 Here n is the number of pairs of data (x, y). Example: study hours and grades Five students have recorded the number of hours they studied for an exam and their grades: Hours 2 5 1 4 2 Grade 80 80 70 90 60 Assuming a bivariate normal population, test at a 5% level of significance whether the correlation between the number of hours of study and the grade is significant. 6

Example: study hours and grades First of all, let us compute the sample correlation coefficient, using formulas or a calculator. We have r = 0.6138. State the hypotheses: H 0 : ρ = 0, H A : ρ 0. (A two-tailed test.) 0.6138 5 2 The test statistic: t = = 1.35 1 0.6138 2 The number of degrees of freedom is df = 3. The critical values: ±t 3, 0.025 = ±3.182 The p-value = 2 0.142 = 0.284 > 0.05 = α Decision: fail to reject H0. Example: reading time and TV Do reading an TV viewing compete for leisure time? To find out, a psychologist interviewed a random sample of 15 children regarding the number of books they had read during the last year and the number of hours they had spent watching TV on a daily basis. If a correlation coefficient of 0.715 is obtained, is the correlation significant at the 5% level of significance? Assume that it s a bivariate normal population. 7

Example: reading time and TV Let us state the hypotheses: H 0 : ρ = 0, H A : ρ 0. (A two-tailed test.) 0.715 15 2 The test statistic: t = = 3.69 1 ( 0.715) 2 The number of degrees of freedom is df = 13. The critical values: ±t 13, 0.025 = ±2.16 (Sketch the curve and the regions of rejection.) The p-value = 2 0.002 = 0.004 < 0.05 = α Decision: reject H0. The confidence intervals We start with the Fisher transformation: Z = 1 1 + r ln 2 1 r It turns out, that for a bivariate normal population, Z is (approximately) normally distributed with the st. deviation of 1Τ n 3 So the confidence interval for μ z is c = Z z( α Τ 2) n 3 < μ Z < Z + z( α Τ 2) n 3 = d 8

The confidence intervals Now we perform the inverse Fisher transformation to get the confidence interval for the population correlation ρ: e 2 c 1 e 2 c + 1 < ρ < e2 d 1 e 2 d + 1 Make sure you can compute these quantities correctly on your scientific calculator! Let us consider examples. Example: study hours and grades Let us find the 95% confidence interval for ρ. Recall that we have r = 0.6138. Do the Fisher transformation: Z = 1 1+0.6138 ln = 0.7150 2 1 0.6138 z ατ2 = 1.96. The confidence interval for μ Z : so 0.7150 1.96 1.96 < μ < 0.7150 + 2 2 c = 0.6709 < μ Z < 2.1009 = d 9

Example: study hours and grades Now we ll do the inverse Fisher transformation. e 2 ( 0.6709) 1 e 2 ( 0.6709) + 1 < ρ < e2 2.1009 1 e 2 2.1009 + 1 After computation, we find the 95% confidence interval for the population correlation coefficient ρ: 0.5856 < ρ < 0.9705 Example: reading time and TV Let us find the 94% confidence interval for ρ. Recall that we have r = 0.715. Do the Fisher transform: Z = 1 1+ 0.715 ln = 0.8973 2 1 0.715 z ατ2 = 1.88. The confidence interval for μ Z : so 0.8973 1.88 1.88 < μ < 0.8973 + 12 12 c = 1.4400 < μ Z < 0.3546 = d 10

Example: reading time and TV Now we ll do the inverse Fisher transformation. e 2 ( 1.44) 1 e 2 ( 1.44) + 1 < ρ < e2 ( 0.3546) 1 e 2 ( 0.3546) + 1 After computation, we find the 94% confidence interval for the population correlation coefficient ρ: 0.8937 < ρ < 0.3404 Example: salt and anxiety (practice) Is there a correlation between one s salt intake and his or her level of stress and anxiety? A study of 32 volunteers has found a correlation coefficient of 0. 26 between the participants salt intake and the amplitude of their adrenaline spikes. (a) Test at a 5% level of significance whether the population correlation is significant. (b) Construct a 95% confidence interval for the population correlation coefficient. Assume a bivariate normal population. 11

Example: immigration and GDP (practice) Do immigration rates correlate with GDP (gross domestic product)? A researcher took data from 40 different countries and found the correlation coefficient for her sample to be equal to 0.44. (a) Test at a 1% level of significance whether there is a significant correlation between immigration rates and GDP. (b) Construct a 98% confidence interval for the population correlation coefficient. 12