Chapter Eight: Assessment of Relationships 1/42

Similar documents
7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Statistics Introductory Correlation

THE PEARSON CORRELATION COEFFICIENT

Can you tell the relationship between students SAT scores and their college grades?

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

Part III: Unstructured Data

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Correlation 1. December 4, HMS, 2017, v1.1

Upon completion of this chapter, you should be able to:

Inferences for Correlation

Chapter Seven: Multi-Sample Methods 1/52

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Inferential statistics

POLI 443 Applied Political Research

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Correlation Analysis

Hypothesis Testing hypothesis testing approach

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

10.4 Hypothesis Testing: Two Independent Samples Proportion

Inferences about central values (.)

Quantitative Analysis and Empirical Methods

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

9 Correlation and Regression

INTERVAL ESTIMATION AND HYPOTHESES TESTING

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Review of Statistics 101

Correlation and Regression Analysis. Linear Regression and Correlation. Correlation and Linear Regression. Three Questions.

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

The t-statistic. Student s t Test

Topic 21 Goodness of Fit

Confidence Intervals, Testing and ANOVA Summary

Spearman Rho Correlation

Econ 325: Introduction to Empirical Economics

POLI 443 Applied Political Research

Lecture 5: ANOVA and Correlation

Statistics Handbook. All statistical tables were computed by the author.

Advanced Experimental Design

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

MATH 240. Chapter 8 Outlines of Hypothesis Tests

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Bivariate Relationships Between Variables

Correlation and Regression

Sampling Distributions: Central Limit Theorem

14: Correlation. Introduction Scatter Plot The Correlational Coefficient Hypothesis Test Assumptions An Additional Example

Psychology 282 Lecture #4 Outline Inferences in SLR

Testing Independence

Statistics for Managers Using Microsoft Excel

Midterm 2 - Solutions

The Chi-Square Distributions

Final Exam - Solutions

y response variable x 1, x 2,, x k -- a set of explanatory variables

Lecture 2. Estimating Single Population Parameters 8-1

Linear Correlation and Regression Analysis

Institute of Actuaries of India

Harvard University. Rigorous Research in Engineering Education

Mathematical statistics

Binary Logistic Regression

Inferences for Regression

2 Regression Analysis

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Chapter 4. Regression Models. Learning Objectives

Ch. 16: Correlation and Regression

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Correlation and simple linear regression S5

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

10: Crosstabs & Independent Proportions

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Finding Relationships Among Variables

STAT 328 (Statistical Packages)

Multiple comparisons - subsequent inferences for two-way ANOVA

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

12.10 (STUDENT CD-ROM TOPIC) CHI-SQUARE GOODNESS- OF-FIT TESTS

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Mathematical Notation Math Introduction to Applied Statistics

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Lecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests

PLSC PRACTICE TEST ONE

1 A Review of Correlation and Regression

Tables Table A Table B Table C Table D Table E 675

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Unit 6 - Introduction to linear regression

Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression

Chapter 12 - Part I: Correlation Analysis

Psych 230. Psychological Measurement and Statistics

Sociology 6Z03 Review II

Chapter 9. Correlation and Regression

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

This gives us an upper and lower bound that capture our population mean.

16.400/453J Human Factors Engineering. Design of Experiments II

Retrieve and Open the Data

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Problem Set 4 - Solutions

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

Inference for Regression

4/22/2010. Test 3 Review ANOVA

The simple linear regression model discussed in Chapter 13 was written as

Transcription:

Chapter Eight: Assessment of Relationships 1/42

8.1 Introduction 2/42 Background This chapter deals, primarily, with two topics. The Pearson product-moment correlation coefficient. The chi-square test for independence.

8.2 Pearson Product-Moment Correlation Coefficient 3/42 The Pearson Product-Moment Correlation Coefficient The conceptual equation for the P-M correlation coefficient is (x x) (y ȳ) r = [ ] [ (x x) 2 ] (y ȳ) 2 where r is the sample correlation coefficient, x and y are the two variables to be correlated and n is the number of paired observations.

8.2 Pearson Product-Moment Correlation Coefficient 4/42 The Pearson Product-Moment Correlation Coefficient The computational form of r is given by xy ( P x)( P y) n r = [ ] [ x 2 (P x) 2 ] y 2 (P y) 2 n n

8.2 Pearson Product-Moment Correlation Coefficient 5/42 Example Calculate the P-M correlation coefficient for the access and wellness scores provided here. Table: Wellness and access to medical care scores for 15 subjects. Subject Access Wellness Subject Access Wellness Number Score Score Number Score Score 1 3 2 11 5 4 2 6 6 12 11 9 3 13 9 13 4 5 4 1 1 14 3 4 5 7 5 15 9 8 6 8 7 7 13 10 8 10 8 9 2 2 10 4 3

8.2 Pearson Product-Moment Correlation Coefficient 6/42 Solution Letting access scores equal x and wellness scores equal y, we note that x = 99, y = 83, xy = 700, x 2 = 869 and y 2 = 575. By Equation 8.2 xy ( P x)( P y) n r = [ ] [ x 2 (P x) 2 ] y 2 (P y) 2 = = [ 869 (99)2 15 n 700 (99)(83) 15 152.20 [215.60] [115.73] =.964 ] [575 (83)2 15 ] n

8.2 Pearson Product-Moment Correlation Coefficient 7/42 Information Provided By P-M The P-M coefficient provides two pieces of information about the relationship of x and y. Nature of the relationship. Strength of the relationship.

8.2 Pearson Product-Moment Correlation Coefficient 8/42 Nature Of The Relationship When P-M takes a positive value it indicates that high values of x tend to be associated with high values of y while low values of x tend to be associated with low values of y. When P-M takes a negative value it indicates that high values of x tend to be associated with low values of y while low values of x tend to be associated with high values of y.

Depiction Of Positive x y Relationship Figure: Bivariate plot of positively correlated health care access scores and wellness index scores. 10 9 Wellness Index Score 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Access Score 8.2 Pearson Product-Moment Correlation Coefficient 9/42

8.2 Pearson Product-Moment Correlation Coefficient 10/42 Depiction Of Negative x y Relationship Figure: Bivariate plot of negatively correlated percents of students on free or reduced lunch and percents using bicycle helmets in nine schools. Bicycle Helmet Wear 30 25 20 15 10 5 0 5 10 15 20 25 30 35 40 45 50 Free and Reduced Lunch

8.2 Pearson Product-Moment Correlation Coefficient 11/42 Strength Of The Relationship P-M can take values between 1 and 1. The P-M is at maximum strength when it takes the value 1.0 or 1.0. P-M loses strength as it recedes toward zero. At zero, P-M is at minimum strength.

8.2 Pearson Product-Moment Correlation Coefficient 12/42 P-M= 1.0 A correlation coefficient of 1.0 means that each subject made exactly the same score on the two variables when scaling differences are eliminated by expressing the two variables in terms of z scores. If you use Equation 8.2 to correlate x and y in the table on the following slide, (#13)you will find that P-M= 1.0.

8.2 Pearson Product-Moment Correlation Coefficient 13/42 Data Set Where r = 1.0. Table: Data set where r = 1.0. Subject Number x y 1 4 11 2 6 15 3 11 25 4 4 11 5 9 21 6 5 13 7 8 19 8 13 29

8.2 Pearson Product-Moment Correlation Coefficient 14/42 Data From Slide #13 Converted To z Scores Table: x and y values from Slide #13 expressed as z scores. Subject Number z x z y 1-1.049-1.049 2 -.449 -.449 3 1.049 1.049 4-1.049-1.049 5.449.449 6 -.749 -.749 7.150.150 8 1.648 1.648

8.2 Pearson Product-Moment Correlation Coefficient 15/42 Plot Of x and y When r = 1.0 Because P-M assesses the degree to which x and y are linearly related, when r = 1.0 the bivariate plot of x and y shows that the individual points fall on a positively sloped line. This can be seen in the figure on the next slide (#16). When r is positive but less than 1.0, the plot suggests a linear relationship but does not constitute a line showing that there is not a perfect linear relationship.

8.2 Pearson Product-Moment Correlation Coefficient 16/42 Bivariate Plot For Data Where r = 1.0 Figure: Bivariate plot of data from slide 13 for which r = 1.0. 30 28 26 24 22 20 y 18 16 14 12 10 0 2 4 5 6 7 8 9 10 11 12 13 x

8.2 Pearson Product-Moment Correlation Coefficient 17/42 P-M= 1.0 When r = 1.0, the magnitude of each subject s z score on the two variables is the same but is always opposite in sign. Thus, if a subject is 1.5 standard deviations above the mean on variable x she will be 1.5 standard deviation below the mean on variable y. If you use Equation 8.2 to correlate x and y in the table on the following slide, (#18)you will find that P-M= 1.0.

8.2 Pearson Product-Moment Correlation Coefficient 18/42 Data Set Where r = 1.0. Table: Data set where r = 1.0. Subject Number x y 1 4 29 2 6 25 3 11 15 4 4 29 5 9 19 6 5 27 7 8 21 8 13 11

8.2 Pearson Product-Moment Correlation Coefficient 19/42 Data From Slide #18 Converted To z Scores Table: x and y values from Slide #18 expressed as z scores. Subject Number z x z y 1-1.049 1.049 2 -.449.449 3 1.049-1.049 4-1.049 1.049 5.449 -.449 6 -.749.749 7.150 -.150 8 1.648-1.648

8.2 Pearson Product-Moment Correlation Coefficient 20/42 Plot Of x and y When r = 1.0 Because P-M assesses the degree to which x and y are linearly related, when r = 1.0 the bivariate plot of x and y shows that the individual points fall on a negatively sloped line. This can be seen in the figure on the next slide (#21). When r is negative but greater than 1.0, the plot suggests a linear relationship but does not constitute a line showing that there is not a perfect linear relationship.

8.2 Pearson Product-Moment Correlation Coefficient 21/42 Bivariate Plot For Data Where r = 1.0 Figure: Bivariate plot of data from slide 18 for which r = 1.0. 30 28 26 24 22 20 y 18 16 14 12 10 0 4 5 2 6 7 8 9 10 11 12 13 x

8.2 Pearson Product-Moment Correlation Coefficient 22/42 Cause-Effect Relationships The fact that two variables are correlated should not be used as evidence that one variable causes the other. Two variables may be correlated because one variable causes the other. Two variables may be correlated without one of the two variables causing the other.

8.2 Pearson Product-Moment Correlation Coefficient 23/42 Test Of H 0 : ρ = 0 A test of H 0 : ρ = 0 may be carried out by means of the following test statistic. r t = 1 r 2 n 2 where r is the P-M correlation coefficient and n is the number of pairs of observations. The degrees of freedom for the test critical value is n 1.

8.2 Pearson Product-Moment Correlation Coefficient 24/42 Test Of H 0 : ρ = 0 (continued) Alternative hypotheses. Two-Tailed One-Tailed One-Tailed H A : ρ 0 H A : ρ < 0 H A : ρ > 0

8.2 Pearson Product-Moment Correlation Coefficient 25/42 Example The correlation between 15 pairs of wellness and access to medical care scores is.964. Use this information to test the hypothesis H 0 : ρ = 0 against the alternative H A : ρ > 0.

8.2 Pearson Product-Moment Correlation Coefficient 26/42 Solution Obtained t is t = r 1 r 2 n 2 =.964 1.964 2 15 2 = 13.072 Reference to Appendix B shows that critical t for a one-tailed test conducted at α =.05 with 13 degrees of freedom is 1.771. Because obtained t exceeds this value, the null hypothesis is rejected. As a result, the researcher can be assured that the population correlation is greater than 0.

8.2 Pearson Product-Moment Correlation Coefficient 27/42 Test Of H 0 : ρ = ρ 0 A test of H 0 : ρ = ρ 0 may be carried out by means of the following test statistic. ( ) ( ).5 ln 1+r 1 r.5 ln 1+ρ0 1 ρ 0 Z = 1 n 3 Here, ln is the natural log, ρ 0 is the hypothesized value of the population correlation coefficient and n is the number of pairs. This statistic is approximately normally distributed so that the test may be conducted by reference to the normal curve.

8.2 Pearson Product-Moment Correlation Coefficient 28/42 Example A researcher knows that the correlation between a shortened form of the Attitude Toward Risky Sexual Behaviors Assessment Scale and a more elaborate form of the scale is.57. After modifying the shortened version and administering it along with the elaborate version to 18 subjects the researcher finds that the newly modified form correlates.71 with the elaborate form. Use this information to perform a two-tailed test of the null hypothesis H 0 : ρ =.57

8.2 Pearson Product-Moment Correlation Coefficient 29/42 Solution By Equation 8.5 Z = = = ( ).5 ln 1+r 1 r ( ).5 ln 1+ρ0 1 ρ 0 1 n 3 ( ).5 ln 1+.71 1.71.887.648.258.5 ln ( 1+.57 1.57 1 18 3 =.926 )

8.2 Pearson Product-Moment Correlation Coefficient 30/42 Solution (continued) From Appendix A it can be seen that the critical Z values for a two-tailed test conducted at α =.05 are 1.96 and 1.96. Because obtained Z is between these two values, the null hypothesis is not rejected. This means that the researcher has been unable to demonstrate a change in correlation after modifying the short form of the scale.

8.2 Pearson Product-Moment Correlation Coefficient 31/42 CI For Estimation Of ρ A confidence interval for the estimation of ρ is provided by the following equations. L = U = (1 + F ) r + (1 F ) (1 + F ) + (1 F ) r (1 + F ) r (1 F ) (1 + F ) (1 F ) r (1) (2) In these equations r is the sample correlation coefficient and F is the appropriate value from Appendix C. The degrees of freedom for F are n 2 for both numerator and denominator degrees of freedom.

8.2 Pearson Product-Moment Correlation Coefficient 32/42 Example Use the Risky Sexual Behaviors Assessment Scale data alluded to previously to form a two-sided 95% confidence interval for the estimation of ρ. The observed correlation in that study was.71 which was obtained from data collected on 18 subjects. Use the resulting interval to perform a two-tailed test of H 0 : ρ =.57 at α =.05. How did you obtain your result?

8.2 Pearson Product-Moment Correlation Coefficient 33/42 Solution By Equations 8.6 and 8.7 and L = U = (1 + 2.76).71 + (1 2.76) (1 + 2.76) + (1 2.76).71 =.910 2.510 =.363 (1 + 2.76).71 (1 2.76) (1 + 2.76) (1 2.76).71 = 4.430 5.010 =.884 F = 2.76 was obtained by entering Appendix C for a two-sided confidence interval with numerator and denominator degrees of freedom of 18 2 = 16. The researcher can, therefore, be 95 percent confident that the population correlation coefficient lies between.363 and.884. The two-tailed hypothesis test would not be rejected because the hypothesized value of.57 lies between the two limits.

8.3 The Chi-Square Test For Independence 34/42 Chi-Square Test: Introduction The chi-square test for independence is used to test the null hypothesis that two discrete variables are independent against the alternative that they are not independent. The chi-square test for independence is simply a more general form of the 2 by k chi-square test.

8.3 The Chi-Square Test For Independence 35/42 The Test Statistic Obtained chi-square is calculated by by the following: χ 2 = [ ] (f o f e ) 2 all cells Here f o and f e are the observed and expected frequencies respectively. The f e are the numbers to be expected in each cell if the null hypothesis is true and are computed as follows. f e f e = (N R) (N C ) N Here N R is the row total for the cell whose expected frequency is being calculated and N C is the column total for the same cell.

8.3 The Chi-Square Test For Independence 36/42 The Test Statistic (continued) The degrees of freedom for the test statistic are computed by χ 2 df = (j 1) (k 1) where j and k are the number of rows and columns in the table respectively.

8.3 The Chi-Square Test For Independence 37/42 j By k Chi-Square Table Table: Depiction of a j by k chi-square table. Variable One Category Category Category Category One Two k Category f o11 f o12 f o1k One f e11 f e12 f e1k Category f o21 f o22 f o2k Variable Two f e21 f e22 f e2k Two Category........ Category f oj1 f oj2 f ojk j f ej1 f ej2 f ejk.

8.3 The Chi-Square Test For Independence 38/42 Example Suppose a survey is conducted in three rural counties to determine vaccination status against hepatitis B. It is found that in county one, 41 persons have been vaccinated, 126 have not been vaccinated and 452 do not know their status. In county two, 202 have been vaccinated, 210 have not been vaccinated and 440 do not know their status. In the last county 330 had been vaccinated, 614 had not been vaccinated, and 680 did not know their status. Use this data to perform a chi-square analysis. Interpret the results.

8.3 The Chi-Square Test For Independence 39/42 Solution The data are arranged for chi-square analysis as follows. Table: Data from survey arranged for chi-square analysis. Vaccinated Not Unknown Vaccinated County [41] [126] [452] 619 One (114.60) (190.00) (314.40 County [202] [210] [440] 852 Two (157.74) (261.52) (432.74) County [330] [614] [680] 1624 Three (300.66) (498.48) (824.86) 573 950 1572 N = 3095

8.3 The Chi-Square Test For Independence 40/42 Solution (continued) Calculations of expected values for the cells in the first row are as follows. f e11 = f e12 = f e13 = ( NOne ) ( NV ) = N ( ) ( ) NOne NNV N ( ) ( ) NOne NU N = = (619) (573) 3095 (619) (950) 3095 (619) (1572) 3095 = 114.60 = 190.00 = 314.40

8.3 The Chi-Square Test For Independence 41/42 Solution (continued) Obtained chi-square is then χ 2 = [ ] (f o f e ) 2 all cells f e (41 114.60)2 (126 190.00)2 (452 314.40)2 = + + 114.60 190.00 314.40 (202 157.74)2 (210 261.52)2 (440 432.74)2 + + + 157.74 261.52 432.74 (330 300.66)2 (614 498.48)2 (680 824.86)2 + + + 300.66 498.48 824.86 = 47.27 + 21.56 + 60.22 + 12.42 + 10.15 +.12 + 2.86 + 26.77 + 25.44 = 206.81

8.3 The Chi-Square Test For Independence 42/42 Solution (continued) The critical value is obtained by entering Appendix D with (j 1) (k 1) = (3 1) (3 1) = 4 degrees of freedom Appendix D shows that for α =.05 and four degrees of freedom, critical χ 2 is 9.488. Because 206.81 is greater than 9.488, the null hypothesis is rejected. We conclude that vaccination status and county of residence are not independent. This means that vaccination status depends on the county of residence. Said yet another way, the proportions of vaccinated, not vaccinated and unknown status, persons in the three counties are not the same.