Chapter 9 - Correlation and Regression

Similar documents
Chapter 10-Regression

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

Simple Linear Regression Using Ordinary Least Squares

Correlation and simple linear regression S5

Review of Multiple Regression

Practical Biostatistics

Can you tell the relationship between students SAT scores and their college grades?

IT 403 Practice Problems (2-2) Answers

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

REVIEW 8/2/2017 陈芳华东师大英语系

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Intro to Linear Regression

Simple Linear Regression

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Multiple linear regression

Unit 6 - Introduction to linear regression

Correlation and Linear Regression

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

9. Linear Regression and Correlation

Key Concepts. Correlation (Pearson & Spearman) & Linear Regression. Assumptions. Correlation parametric & non-para. Correlation

1 Correlation and Inference from Regression

Intro to Linear Regression

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 March 28, 2002

Multiple linear regression S6

Inferences for Regression

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Correlation & Simple Regression

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Multiple Regression Examples

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

TOPIC 9 SIMPLE REGRESSION & CORRELATION

y response variable x 1, x 2,, x k -- a set of explanatory variables

Multiple OLS Regression

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

( ), which of the coefficients would end

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

General Linear Model (Chapter 4)

In Class Review Exercises Vartanian: SW 540

23. Inference for regression

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Unit 6 - Simple linear regression

Introduction to Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

y n 1 ( x i x )( y y i n 1 i y 2

Interactions between Binary & Quantitative Predictors

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D.

4/22/2010. Test 3 Review ANOVA

Lecture 18: Simple Linear Regression

bivariate correlation bivariate regression multiple regression

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

SMAM 314 Exam 42 Name

Lecture 3: Inference in SLR

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Correlation and Regression Bangkok, 14-18, Sept. 2015

Introduction to Linear regression analysis. Part 2. Model comparisons

Multivariate Correlational Analysis: An Introduction

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Simple Linear Regression: One Qualitative IV

1 A Review of Correlation and Regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Prob/Stats Questions? /32

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Review of Statistics 101

Stat 101 Exam 1 Important Formulas and Concepts 1

Business Statistics. Lecture 9: Simple Regression

11 Correlation and Regression

Chapter 9. Correlation and Regression

STAT 4385 Topic 03: Simple Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Multiple Linear Regression I. Lecture 7. Correlation (Review) Overview

SPSS LAB FILE 1

Sociology 593 Exam 1 February 14, 1997

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

A particularly nasty aspect of this is that it is often difficult or impossible to tell if a model fails to satisfy these steps.

Data Analysis and Statistical Methods Statistics 651

Analysing data: regression and correlation S6 and S7

Linear Correlation and Regression Analysis

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

The t-statistic. Student s t Test

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

Upon completion of this chapter, you should be able to:

Transcription:

Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of LBW infants (Y) and percentage of births to unmarried mothers (X 2 ) in Vermont Health Planning Districts. N 0 0.698 s X.322 cov 2 X 2 Y 0.389 r cov X 2 Y s X 2 0. 389 (. 322( 0. 698).35 9.5 Three sets of data:

9.7 Correlations for the three data sets in Exercise 9.5: a. r cov XY s X For each set, s X. 826 and 2. 582 Set : r cov XY s X Set 2: r cov XY s X Set 3: r cov XY s X 4. 67 (. 826) ( 2. 582).99 3. 33 (. 826) ( 2. 582).7 4. 67 (. 826) ( 2. 582). 99 b. Three arrangements of Y will result in the lowest possible positive correlation: 2 8 6 4 or 6 4 2 8 or 6 2 8 4 [r.4] 9.9 Cerebral hemorrhage in low-birthweight infants and cognitive deficit at age 5: a. Power calculation: d. 20 δ d N. 20 25 0. 980 power. 7 b. For power.80, δ 2.8 δ d N 2. 8. 20 N N 97 9. Standard error of estimate for regression equation in Exercise 9.0: N 0 r.62 [Calculated in Exercise 9.2]

2 N 2 sy. X sy ( r ).698 (.62 )( 9 / 8 ).580 N 2 9.3 If high-risk fertility rate in Exercise 9.0 jumped to 70: Yˆ bx + a ( ).069 70 + 3.53 8.36 would be the predicted percentage of LBW infants born. 9.5 Number of symptoms predicted for a stress score of 8 using the data in Table 9.2 : Regression equation: Y ( X ) 0.0086 + 4.30 If Stress score (X) 8: Y 0.0086( 8) + 4.30 Predicted ln(symptoms) score is : Y 4.37 9.7 Confidence interval on Y I will calculate them for X incrementing between 0 and 60 in steps of 0 ( ) ± ( ) CI Y Y t s α /2 Y. X ( i ) ( ) Y. X Y. X 2 N N sx ( i ) 2 2 X X X X s s + + 0.726 + + 07 06 56.05 ( ) t Y 0.00856X + 4.30 α /2.983 For X from 0 to 60 in steps of 0, s Y.X 0.757 0.74 0.734 0.738 0.752 0.776 0.80 CI( Y ) Yˆ ± ( t )( X ) ' α /2. For several different values of X, calculate Y and s' Y.X and plot the results. X 0 0 20 30 40 50 60 Y 4.300 4.386 4.47 4.557 4.642 4.728 4.84

The curvature is hard to see, but it is there, as can be seen in the graphic on the right, which plots the width of the interval as a function of X. (It s fun to play with R). 9.9 When data are standardized, the slope equals r. Therefore the slope will be less than one for all but the most trivial case, and predicted deviations from the mean will be less than actual parental deviations. 9.2 Number of subjects needed in Exercise 9.20 for power.80: For power.80, δ 2.80 δ ρ N 2. 80. 40 30 N 50 9.23 Katz et al. correlations with SAT scores. a. r.68 r '.829 r 2.5 r 2 '.563 z r ' r 2 ' N 3 + N 2 3. 829. 563 4 + 25 0. 797 The correlations are not significantly different from each other. b. We do not have reason to argue that the relationship between performance and prior test scores is affected by whether or not the student read the passage.

9.25 It is difficult to tell whether the significant difference between the results of the two previous problems is to be attributable to the larger sample sizes or the higher (and thus more different) values of r'. It is likely to be the former. 9.27 Moore and McCabe example of alcohol and tobacco use: Correlations ALCOHOL TOBACCO Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N ALCOHOL TOBACCO.000.224..509.224.000.509. little. b. The data suggest that people from Northern Ireland actually drink relatively 6.5 6.0 5.5 Alcohol use 5.0 4.5 4.0 3.5 2.5 3.0 3.5 4.0 4.5 5.0 Tobacco use c. With Northern Ireland excluded from the data the correlation is.784, which is significant at p.007. 9.29 a. The correlations range between.40 and.80. b. The subscales are not measuring independent aspects of psychological wellbeing.

9.3 Relationship between height and weight for males: Scatterplot for Males 225 98 7 Weight 44 7 90 60 65 70 75 80 Height The regression solution that follows was produced by SPSS and gives all relevant results. Model Model Summary b Adjusted Std. Error of R R Square R Square the Estimate.604 a.364.353 4.997 a. Predictors: (Constant), HEIGHT b. Gender Male Model Regression Residual Total ANOVA b,c Sum of Squares df Mean Square F Sig. 7087.800 7087.800 3.536.000 a 236.253 55 224.750 9449.053 56 a. Predictors: (Constant), HEIGHT b. Dependent Variable: WEIGHT c. Gender Male

Model (Constant) HEIGHT Unstandardized Coefficients a. Dependent Variable: WEIGHT b. Gender Male Coefficients a,b Standardi zed Coefficien ts B Std. Error Beta t Sig. -49.934 54.97-2.730.008 4.356.776.604 5.66.000 With a slope of 4.36, the data predict that two males who differ by one inch will also differ by approximately 4 /3 pounds. The intercept has no meaning because people are not 0 inches tall, but the fact that it is so largely negative suggests that there is some curvilinearity in this relationship for low values of Height. Tests on the correlation and the slope are equivalent tests when we have one predictor, and these tests tell us that both are significant. Weight increases reliably with increases in height. 9.33 As a 5 8 male, my predicted weight is Y 4.356(Height) - 49.934 4.356*68-49.934 46.27 pounds. a. I weigh 46 pounds. (Well, I did two years ago.) Therefore the residual in the prediction i- Y 46-46.27-0.27. b. If the students on which this equation is based under- or over-estimated their own height or weight, the prediction for my weight will be based on invalid data and will be systematically in error. 9.35 The male would be predicted to weigh 37.562 pounds, while the female would be predicted to weigh 25.354 pounds. The predicted difference between them would be 2.72 pounds. 9.37 Independence of trials in reaction time study. The data were plotted by trial, where a larger trial number represents an observation later in the sequence.

Although the regression line has a slight positive slope, the slope is not significantly different from zero. This is shown below. DEP VAR: TRIAL N: 00 MULTIPLE R: 0.8 SQUARED MULTIPLE R: 0.033 ADJUSTED SQUARED MULTIPLE R: 0.023 STANDARD ERROR OF ESTIMATE: 28.67506 VARIABLE COEFFICIENT STD ERROR STD COEF TOLERANCE T P(2 TAIL) CONSTANT 22.84259 5.94843 0.00000..4E+02.0E- 4 RXTIME 0.42862 0.23465 0.846.00000.82665 0.07080 ANALYSIS OF VARIANCE SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P REGRESSION 2743.58452 2743.58452 3.33664 0.07080 RESIDUAL 8058.4548 98 822.25934 There is not a systematic linear or cyclical trend over time, and we would probably be safe in assuming that the observations can be treated as if they were independent. Any slight dependency would not alter our results to a meaningful degree.