CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Similar documents
STP 226 ELEMENTARY STATISTICS

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

BIOSTATISTICS NURS 3324

Chapter 10 Correlation and Regression

y n 1 ( x i x )( y y i n 1 i y 2

Analysis of Bivariate Data

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 12 - Part I: Correlation Analysis

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

CREATED BY SHANNON MARTIN GRACEY 146 STATISTICS GUIDED NOTEBOOK/FOR USE WITH MARIO TRIOLA S TEXTBOOK ESSENTIALS OF STATISTICS, 3RD ED.

Topic 10 - Linear Regression

Approximate Linear Relationships

Summarizing Data: Paired Quantitative Data

If we plot the position of a moving object at increasing time intervals, we get a position time graph. This is sometimes called a distance time graph.

Simple Linear Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

STAT Chapter 11: Regression

BIVARIATE DATA data for two variables

Chapter 4 Describing the Relation between Two Variables

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 3: Examining Relationships

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Unit 6 - Simple linear regression

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Regression line. Regression. Regression line. Slope intercept form review 9/16/09

Important note: Transcripts are not substitutes for textbook assignments. 1

A company recorded the commuting distance in miles and number of absences in days for a group of its employees over the course of a year.

Describing the Relationship between Two Variables

Chapter 5 Friday, May 21st

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Stat 101 Exam 1 Important Formulas and Concepts 1

Linear Regression and Correlation. February 11, 2009

Chapter 6: Exploring Data: Relationships Lesson Plan

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Sociology 6Z03 Review I

Chapter 3: Examining Relationships

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

THE PEARSON CORRELATION COEFFICIENT

9. Linear Regression and Correlation

Chapter 7. Scatterplots, Association, and Correlation

Correlation and Regression

Linear Regression Communication, skills, and understanding Calculator Use

Data Analysis and Statistical Methods Statistics 651

Simple Linear Regression Using Ordinary Least Squares

Scatter plot of data from the study. Linear Regression

appstats8.notebook October 11, 2016

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Statistical View of Least Squares

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

+ Statistical Methods in

Statistical View of Least Squares

Unit 6 - Introduction to linear regression

Chapter 3: Describing Relationships

Overview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.

DEFINITION Function A function from a set D to a set R is a rule that assigns a unique element in R to each element in D.

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Scatter plot of data from the study. Linear Regression

Chapter 2: Looking at Data Relationships (Part 3)

1 A Review of Correlation and Regression

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Simple Linear Regression

1) A residual plot: A)

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Density Temp vs Ratio. temp

Simple Linear Regression for the MPG Data

Chapter 9. Correlation and Regression

9 Correlation and Regression

Correlation & Simple Regression

1. Use Scenario 3-1. In this study, the response variable is

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 16. Simple Linear Regression and dcorrelation

Inferences for Regression

PS2: Two Variable Statistics

Pre-Calculus Multiple Choice Questions - Chapter S8

Key Algebraic Results in Linear Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

CORRELATION AND REGRESSION

Correlation and Linear Regression

Section 5.4 Residuals

Chapter 1 Linear Equations and Graphs

IT 403 Practice Problems (2-2) Answers

Chapter 3: Describing Relationships

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

Basic Business Statistics 6 th Edition

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

CHAPTER EIGHT Linear Regression

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

bivariate correlation bivariate regression multiple regression

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

How can we explore the association between two quantitative variables?

Simple Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

STAT5044: Regression and Anova. Inyoung Kim

Simple and Multiple Linear Regression

Transcription:

STP 226 ELEMENTARY STATISTICS CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION Linear Regression and correlation allows us to examine the relationship between two or more quantitative variables. 4.1 Linear Equations with one Independent Variable y = b 0 + b 1 x is a straight line where b 0 and b 1 are constants, b 0 is the y-intercept and b 1 is the slope of the line. Slope (b 1 ) = for every 1 unit horizontal increase (in x) there is a b 1 unit vertical increase/decrease (in y) depending on the line. Slope=change in y/change in x. For example if units of x are kg and units of y are $ and y=5+2x, slope=2$/kg, so per each increase in x by 1 kg, y increases by $2. The straight-line graph of the linear equation y = b 0 + b 1 x slopes upward if b 1 > 0, slopes downward if b 1 < 0, and is horizontal if b 1 = 0. Vertical line has no slope. 1

4.2 The Regression Equation Often, in real life situations, it is not likely to have data that follow some straight line perfectly. A scatterplot (scatter diagram) is useful in visualizing apparent relationships between two variables. Example(Table 4.5) (Age and price of a Orion) Car (Orion) 1 2 3 4 5 6 7 8 9 10 11 Age (yr): x 5 4 6 5 5 5 6 6 2 7 7 Price 85 103 70 82 89 98 66 95 169 70 48 ($100): y If the points seem to follow a straight line, then a straight line can be used to approximate the relationship. You observe a linear trend if your data points form an elongated cloud shape. If there is no linear trend, you should not fit the linear equation to your data. Many different lines can be drawn to approximate the relationship, however, the least squares criterion method gives the best line to fit the data (relationship between the two variables). Least Squares Criterion The straight line that best fits a set of data points is the one having the smallest possible sum of squared errors. We define error as e=y y, where y=b 0 x is the equation of the line. Error e is the signed vertical distance from the data point to the line e<0 if the point is below the line, e>0 if the point is above the line and e=0 if the point is on the line. 2

Find the difference (error) between every point and its corresponding point on the best fit line, square those errors, and sum them up. We want this sum to be minimum. The picture above uses symbol d i for the signed distances from the point to the line. min d i 2 =min y i b 0 x i 2 =min b0, b 1 e 2 Regression Line The straight line that best fits a set of data points according to the least squares criterion. Regression equation The equation of the regression line: y=195.47 20.6x 3

Notation used in Regression and Correlation Definition: Computational Formula S xx = x x 2 S xx = x 2 x 2 n S xy = x x y y S xy = xy x y n S yy = y y 2 S yy = y 2 y 2 Formula 4.1 Regression Equation for a set of n data points is y=b 0 x, where b 1 = S xy S xx and b 0 = 1 n y b 1 x = y b 1 x n Predictor Variable and Response Variable For the linear regression equation, y=b 0 x y response variable or dependent variable x predictor variable/explanatory variable or independent variable Example (Orion Data): response variable=price, predictor variable=age Extrapolation - making predictions for values of the predictor variable outside the range of the observed values of the predictor variable. Grossly incorrect predictions can result from extrapolation. Outlier a data point that lies far from the regression line, relative to other data points Influential observation a data point whose removal causes the regression to change considerably. It is usually separated in the x- direction from the other data points. It pulls the regression line towards itself. Warnings on the use of Linear Regression Draw scatter diagram first Predict within the range of the data. Watch out for the influential observation 4

2 4.3 The Coefficient of Determination ( r ) One way of measuring the utility of regression equation determine the percentage of variation in the observed values of the response variable explained by the regression (or predictor variable). Example (Orion Data) x y ŷ y y ŷ y y yˆ 5 85 94.16-3.64 5.53-9.16 4 103 114.42 14.36 25.79-11.42 6 70 73.90-18.64-14.74-3.90 5 82 94.16-6.64 5.53-12.16 5 89 94.16 0.36 5.53-5.16 5 98 94.16 9.36 5.53 3.84 6 66 73.90-22.64-14.74-7.90 6 95 73.90 6.36-14.74 21.10 2 169 154.95 80.36 66.31 14.05 7 70 53.64-18.64-35.00 16.36 7 48 53.64-40.64-35.00-5.64 Coefficient of determination, r 2 : is the proportion of variation is the observed values of the response variable that is explained by the regression. S 2 xy r 2 = S xx S yy Eg. (contd.) r 2 = 8285.0/9708.5 = 0.853 (85.3%) 5

4.4 Linear Correlation Linear Correlation Coefficient, r (Pearson product moment correlation coefficient): A statistic used to measure the strength of linear relationship between two variables. DEFINITION 4.6 The linear correlation coefficient, r, of n data points is defined by r = 1 n 1 x x y y S x S y, or r S S xx xy S yy Eg. (contd.) x=58, y=975, xy=4732, x 2 =326, y 2 =96129 r = S xy xy x y /n = S xx S yy [ x 2 x 2 /n][ y 2 y 2 /n ] = 0.924 strong negative linear relationship between the age and price of Orions. Understanding the Linear Correlation Coefficient a. r reflects the slope of the scatter diagram b. the magnitude of r indicates the strength of the linear relationship c. The sign of r suggests the type of linear relationship r =0 means no linear relation, r >0 means positive relation, r <0 mean negative relation between the two variables. d. r has no units e. The sign of r is identical to the sign of the slope of the regression line. Note: coefficient of determination, r 2 is the square of the linear correlation coefficient. Eg. (contd.) (-0.924) 2 = 0.854 Warnings on the use of linear correlation coefficient. a. measures only linear relation between the variables b. high correlation does not have to imply causal relationship between x and y c. watch out for the spurious correlation (lurking variables), x and y may be associated with the third variable that produces changes in both. d. correlation is affected by the extreme observations (same as mean and standard deviation) e. Watch out for separate groups 6