Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Similar documents
Inferences for Regression

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

9. Linear Regression and Correlation

appstats27.notebook April 06, 2017

A discussion on multiple regression models

Correlation Analysis

Warm-up Using the given data Create a scatterplot Find the regression line

Linear Regression and Correlation

REVIEW 8/2/2017 陈芳华东师大英语系

Unit 6 - Introduction to linear regression

HMH Fuse Algebra correlated to the. Texas Essential Knowledge and Skills for Mathematics High School Algebra 1

Chapter 27 Summary Inferences for Regression

Inference for Regression Simple Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Unit 6 - Simple linear regression

BIOSTATISTICS NURS 3324

Topic 10 - Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 4. Regression Models. Learning Objectives

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Correlation and Regression

Chapter 14 Simple Linear Regression (A)

Correlation and Linear Regression

Basic Business Statistics 6 th Edition

Simple Linear Regression Using Ordinary Least Squares

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Statistics for Managers using Microsoft Excel 6 th Edition

1 Correlation and Inference from Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Correlation and Regression

Mathematics for Economics MA course

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 12 - Part I: Correlation Analysis

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Review of Multiple Regression

UNIT 12 ~ More About Regression

Business Statistics. Lecture 10: Course Review

Scatter plot of data from the study. Linear Regression

Ordinary Least Squares Regression Explained: Vartanian

Correlation and Regression Analysis. Linear Regression and Correlation. Correlation and Linear Regression. Three Questions.

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Simple Linear Regression

23. Inference for regression

STAT Chapter 11: Regression

Big Data Analysis with Apache Spark UC#BERKELEY

Ch 13 & 14 - Regression Analysis

Scatter plot of data from the study. Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Regression Models. Chapter 4. Introduction. Introduction. Introduction

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Six Sigma Black Belt Study Guides

df=degrees of freedom = n - 1

Cumulative Test 1. Evaluate the expression Answers [32 (17 12) 2 ] [(5 + 3)2 31]

Simple Linear Regression. John McGready Johns Hopkins University

Lecture 10 Multiple Linear Regression

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

Psychology 282 Lecture #4 Outline Inferences in SLR

Chapter 4: Regression Models

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 18: Simple Linear Regression

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

School of Mathematical Sciences. Question 1

Lecture 14 Simple Linear Regression

Lecture 11: Simple Linear Regression

Conditions for Regression Inference:

STAT 350 Final (new Material) Review Problems Key Spring 2016

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Chapter 13. Multiple Regression and Model Building

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Inference for the Regression Coefficient

ST430 Exam 1 with Answers

11 Correlation and Regression

INFERENCE FOR REGRESSION

Simple Linear Regression

Chapter 9. Correlation and Regression

Stat 231 Exam 2 Fall 2013

Chapter 16. Simple Linear Regression and Correlation

Lectures 5 & 6: Hypothesis Testing

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

28. SIMPLE LINEAR REGRESSION III

Ordinary Least Squares Regression Explained: Vartanian

Regression Models - Introduction

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Chapter 7 Linear Regression

Inference for Regression

Chapter 10. Simple Linear Regression and Correlation

Midterm 2 - Solutions

Business Statistics. Lecture 9: Simple Regression

Simple Linear Regression

Chapter 4 Describing the Relation between Two Variables

Midterm 2 - Solutions

Transcription:

Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Scatter Diagrams A graph in which pairs of points, (x, y), are plotted with x on the horizontal axis and y on the vertical axis. The explanatory variable is x. The response variable is y. One goal of plotting paired data is to determine if there is a linear relationship between x and y. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 2

Finding the Best Fitting Line One goal of data analysis is to find a mathematical equation that best represents the data. For our purposes, best means the line that comes closest to each point on the scatter diagram. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 3

Not All Relationships are Linear Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 4

Correlation Strength Correlation is a measure of the linear relationship between two quantitative variables. If the points lie exactly on a straight line, we say there is perfect correlation. As the strength of the relationship decreases, the correlation moves from perfect to strong to moderate to weak. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 5

Correlation If the response variable, y, tends to increase as the explanatory variable, x, increases, the variables have positive correlation. If the response variable, y, tends to decrease as the explanatory variable, x, increases, the variables have negative correlation. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 6

Correlation Examples Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 7

Correlation Coefficient The mathematical measurement that describes the correlation is called the sample correlation coefficient. Denoted by r. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 8

Correlation Coefficient Features 1) r has no units. 2) -1 r 1 3) Positive values of r indicate a positive relationship between x and y (likewise for negative values). 4) r = 0 indicates no linear relationship. 5) Switching the explanatory variable and response variable does not change r. 6) Changing the units of the variables does not change r. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 9

r = 0 Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 10

r = 1 or r = -1 Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 11

0 < r < 1 Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 12

-1 < r < 0 Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 13

Developing a Formula for r Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 14

Developing a Formula for r Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 15

A Computational Formula for r Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 16

Critical Thinking: Population vs. Sample Correlation Beware: high correlations do not imply a cause and effect relationship between the explanatory and response variables!! Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 17

Critical Thinking: Lurking Variables A lurking variable is neither the explanatory variable nor the response variable. Beware: lurking variables may be responsible for the evident changes in x and y. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 18

Correlations Between Averages If the two variables of interest are averages rather than raw data values, the correlation between the averages will tend to be higher! Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 19

Linear Regression When we are in the paired-data setting: Do the data points have a linear relationship? Can we find an equation for the best fitting line? Can we predict the value of the response variable for a new value of the predictor variable? What fractional part of the variability in y is associated with the variability in x? Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 20

Least-Squares Criterion The line that we fit to the data will be such that the sum of the squared distances from the line will be minimized. Let d = the vertical distance from any data point to the regression line. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 21

Least-Squares Criterion Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 22

Least-Squares Line yˆ a bx a is the y-intercept. b is the slope. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 23

Finding the Regression Equation Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 24

Finding the Regression Equation Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 25

Using the Regression Equation ( x, y) The point is always on the least-squares line. Also, the slope tells us that if we increase the explanatory variable by one unit, the response variable will change by the slope (increase or decrease, depending on the sign of the slope). Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 26

Critical Thinking: Making Predictions We can simply plug in x values into the regression equation to calculate y values. Extrapolation can result in unreliable predictions. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 27

Coefficient of Determination Another way to gauge the fit of the regression equation is to calculate the coefficient of determination, r 2. We can view the mean of y as a baseline for all the y values. Then: Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 28

Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 29

Coefficient of Determination We algebraically manipulate the above equation to obtain: Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 30

Coefficient of Determination Thus, r 2 is the measure of the total variability in y that is explained by the regression on x. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 31

Coefficient of Determination Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 32

Inferences on the Regression Equation We can make inferences about the population correlation coefficient, ρ, and the population regression line slope, β. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 33

Inferences on the Regression Equation Inference requirements: The set of ordered pairs (x, y) constitutes a random sample from all ordered pairs in the population. For each fixed value of x, y has a normal distribution. All distributions for all y values have equal variance and a mean that lies on the regression equation. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 34

Testing the Correlation Coefficient Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 35

Testing the Correlation Coefficient We convert to a Student s t distribution. The sample size must be 3. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 36

Testing Procedure Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 37

Standard Error of Estimate As the points get closer to the regression line, S e decreases. If all the points lie exactly on the line, S e will be zero. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 38

Computational Formula For S e Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 39

Confidence Intervals for y The population regression line: y = α + βx + ε Where ε is random error Because of ε, for each x value there is a corresponding distribution for y. Each y distribution has the same standard deviation, estimated by S e. Each y distribution is centered on the regression line. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 40

Confidence Intervals for Predicted y Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 41

Confidence Intervals for Predicted y Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 42

Confidence Intervals for Predicted y The confidence intervals will be narrower near the mean of x. The confidence intervals will be wider near the extreme values of x. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 43

Confidence Intervals for Predicted y Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 44

Inferences About Slope β Recall the population regression equation: y = α + βx + ε Let S e be the standard error of estimate from the sample. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 45

Inferences About Slope β Has a Student s t distribution with n 2 degrees of freedom. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 46

Testing β Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 47

Finding a Confidence Interval for β Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 48

Multiple Regression In practice, most predictions will improve if we consider additional data. Statistically, this means adding predictor variables to our regression techniques. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 49

Multiple Regression Terminology y is the response variable. x 1 x k are explanatory variables. b 0 b k are coefficients. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 50

Regression Models A mathematical package that includes the following: 1) A collection of variables, one of which is the response, and the rest are predictors. 2) A collection of numerical values associated with the given variables. 3) Using software and the above numerical values, construct a least-squares regression equation. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 51

Regression Models A mathematical package that includes the following: 4) The model includes information about the variables, coefficients of the equation, and various goodness of fit measures. 5) The model allows the user to make predictions and build confidence intervals for various components of the equation. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 52

Coefficient of Multiple Determination How well does our equation fit the data? Just as in linear regression, computer output will provide r 2. r 2 is a measure of the amount of variability in y that is explained by the regression on all of the x variables. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 53

Predictions in the Multiple Regression Case Similar to the techniques described for linear regression, we can plug in values for all of the x variables and compute a predicted y value. Predicting y for any x values outside the x- range is extrapolation, and the results will be unreliable. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 54

Testing a Coefficient for Significance At times, one or more of the predictor variables may not be helpful in determining y. Using software, we can test any or all of the x variables for statistical significance. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 55

Testing a Coefficient for Significance The hypotheses will be: H 0 : β i = 0 H 1 : β i 0 for the i th coefficient in the model. If we fail to reject the null hypothesis for a given coefficient, we may consider removing that predictor from the model. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 56

Confidence Intervals for Coefficients As in the linear case, software can provide confidence intervals for any β i in the model. Copyright Houghton Mifflin Harcourt Publishing Company. All rights reserved. 10 57