TOPIC 9 SIMPLE REGRESSION & CORRELATION

Similar documents
Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

1 Correlation and Inference from Regression

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Correlation Analysis

Chapter 9 - Correlation and Regression

Basic Business Statistics 6 th Edition

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Correlation and simple linear regression S5

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Mathematical Notation Math Introduction to Applied Statistics

STAT 350 Final (new Material) Review Problems Key Spring 2016

Bivariate Regression Analysis. The most useful means of discerning causality and significance of variables

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Topic 1. Definitions

Intro to Linear Regression

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

A discussion on multiple regression models

Content by Week Week of October 14 27

BIOSTATISTICS NURS 3324

Introduction to Regression

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D.

Advanced Experimental Design

Statistics for Managers using Microsoft Excel 6 th Edition

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Ordinary Least Squares Regression Explained: Vartanian

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Business Statistics. Lecture 10: Correlation and Linear Regression

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

ST505/S697R: Fall Homework 2 Solution.

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Chapter 10 Regression Analysis

Sociology 593 Exam 2 Answer Key March 28, 2002

Simple Linear Regression Using Ordinary Least Squares

Upon completion of this chapter, you should be able to:

Review of Multiple Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Regression. Marc H. Mehlman University of New Haven

Single and multiple linear regression analysis

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

Advanced Quantitative Data Analysis

Correlation and Regression Bangkok, 14-18, Sept. 2015

Simple Linear Regression: One Quantitative IV

16.400/453J Human Factors Engineering. Design of Experiments II

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

CRP 272 Introduction To Regression Analysis

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Correlation and Regression

Chapter 3: Examining Relationships

Ch Inference for Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Swarthmore Honors Exam 2012: Statistics

Inferences for Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

9. Linear Regression and Correlation

Important note: Transcripts are not substitutes for textbook assignments. 1

One-sided and two-sided t-test

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

The inductive effect in nitridosilicates and oxysilicates and its effects on 5d energy levels of Ce 3+

Chapter 12 : Linear Correlation and Linear Regression

Linear regression and correlation

Block 3. Introduction to Regression Analysis

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Intro to Linear Regression

Correlation and Linear Regression

Learning Goals. 2. To be able to distinguish between a dependent and independent variable.

Topic - 12 Linear Regression and Correlation

Item-Total Statistics. Corrected Item- Cronbach's Item Deleted. Total

bivariate correlation bivariate regression multiple regression

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Six Sigma Black Belt Study Guides

Approximate Linear Relationships

Sociology 593 Exam 2 March 28, 2002

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 16. Simple Linear Regression and Correlation

Sociology 593 Exam 1 February 17, 1995

Review of Statistics 101

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Linear Probability Model

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Mathematical Notation Math Introduction to Applied Statistics

TESTING FOR CO-INTEGRATION

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Ch. 16: Correlation and Regression

Can you tell the relationship between students SAT scores and their college grades?

Inter Item Correlation Matrix (R )

SPSS LAB FILE 1

ST430 Exam 1 with Answers

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

Transcription:

TOPIC 9 SIMPLE REGRESSION & CORRELATION Basic Linear Relationships Mathematical representation: Y = a + bx X is the independent variable [the variable whose value we can choose, or the input variable]. Y is the dependent variable [the output variable whose value is determined by the relationship] a is the constant b is the coefficient of X Graphical representation: Y Y = a + bx 1 b a In real situations, X and Y are replaced by more meaningful variables. X

Example : A tradesperson charges $30 per hour plus a callout fee of $20. This can be represented mathematically as C = 20 + 30T where T is the time in hours and C is the total cost. Graphically: 160 140 120 100 C 80 60 40 20 0 C = 20 + 30T 0 1 2 3 4 5 T From either of the representations it is possible to determine that the total cost of a 4 hour job would be $140.

Simple Regression: Finding Basic Linear Relationships from Real Data. Example : Suppose we kept records of the time it takes (in hours) to load removal vans and the number of rooms in each house. We have the following sample of cases: ROOMS TIME 3 4 3 5 4 5 4 7 5 6 5 7 6 6 6 8 7 7 7 8 These can be plotted on a Scatter Diagram. Then using a ruler we can find a reasonably well-fitting line that passes through this data. 9 8 7 T 6 i 5 m 4 e 3 2 1 0 b 0.75 a 2.9 0 1 2 3 4 5 6 7 8 Rooms We can then estimate the value of a and b and obtain the mathematical representation below TIME 2.9 + 0.75 ROOMS (approximately)

Alternatively: We can feed the raw data into a computer and run a Regression program which will find the values of a and b automatically. The typical output from such a program is given below. Before we believe and use any computer output, we must interpret and test it. This involves asking important questions and knowing what to look for: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1.780 a.609.560.8874 ANOVA Model Sum of Squares df Mean Square F Sig. 1 Regression 9.800 1 9.800 12.444.008 Residual 6.300 8.787 Total 16.100 9 Coefficients Model Unstandardised Coefficients Standardised Coefficients B Std. Error Beta t Sig. 1 (Constant) 2.800 1.031 2.716.026 ROOMS.700.198.780 3.528.008

Question 1: Does any linear relationship exist in that data? To find the answer we must look in the ANOVA table for the Sig. or p-value, and do an F-test Ho: There is no linear relationship. Ha: There is some linear relationship. Alpha :.05 Rule : If Sig. or p-value of F < Alpha then Ha. Conclusion : Sig. =.008 <.05 Therefore there is some linear relationship. Question 2 : How strong is this relationship? We look for R-SQUARE, the Coefficient of Determination This is always somewhere between 0 and 1. Values close to 0 indicate a very week relationship and close to 1 they suggest a very strong relationship. Other values in between can be described using appropriate language. Eg. 0.7 might be called moderately strong. Statistical interpretation of R-square: An R-square value of 0.7 means that the independent variable accounts for 70% of the variation in the dependent variable. Another way of looking at it is that the independent variable supplies 70% percent of the information needed to accurately predict the dependent variable. In other words, the model is 70% complete. In our removalist example: R-square =.609 The number of Rooms accounts for 60.9% of the variation in TIME.

Question 3 : What is the relationship? To formulate it, we must find the constant and the coefficient of the independent variable R in the computer output. We can then write down the full Model / Equation / Formula: TIME = 2.8 + 0.7 ROOMS This can be used for making predictions. For example, to find the predicted time to load the van when moving a 5 room house: TIME = 2.8 + 0.7 x 5 = 6.3 HOURS. However, the answer we get is only an average or approximate value because the formula cannot predict perfectly. In practice, we must apply a safety margin to give us a prediction interval. Prediction Interval The safety margin we need to apply depends on: 1. Our desired level of confidence 2. The position of our independent variable value in relation to the centre of our data. 3. The standard error of estimate. When our sample is large and we are interpolating (as we generally should be), the following simplified formula may be used: Y = Y ± 2 (Standard Error of Estimate) In our removalist example: TIME = 6.3 ± 2 (.8874) = 6.3 ± 1.7748 hours with 95% confidence.

Correlation Correlation describes the degree of association between two variables without regard to which variable is independent or dependent, and with no intention of formulating the relationship or using it to make predictions. Typical computer output Correlations ROOMS TIME ROOMS Pearson Correlation 1.000.780** Sig. (2-tailed)..008 N 10 10 TIME Pearson Correlation.780** 1.000 Sig. (2-tailed).008. N 10 10 **. Correlation is significant at the 0.01 level (2-tailed). The value of a Correlation coefficient lies between 1 and 1. Negative values indicate a downward sloping relationship. Positive values indicate an upward sloping relationship. Values between 0 and either extreme measure how closely the actual data fits a straight line. These situations are illustrated below: r =-1 r =-0.6 r = 0.9 r = 0 r = 1

Testing the Correlation: Because our correlation coefficient is calculated only from a sample, we can never be certain that the same value applies to the population. In fact, regardless of the sample correlation, there might be no linear association in the population at all. Therefore the correlation must be tested: Example : Suppose we obtained a correlation of 0.57 between the amount of time spent studying and the marks on a given assessment using a sample of 100 students. Does this show beyond reasonable doubt, that there is some (linear) connection between study time and results? Ho : Correlation = 0 Ha : Correlation not = 0 Alpha :.05 We use a t-test with DF = N- 2 r =.57 n = 100 H A H 0 H A.025-1.984 1.984 t = r n 2 2 1 r H A = 6.86 Conclusion: There is some linear relationship between study time and results. Notes : # Correlations might not be linear and look weaker than they really are. # Do not confuse correlation with causality.