regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Similar documents
(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Simple Linear Regression

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

CORRELATION AND REGRESSION

Business Statistics. Lecture 9: Simple Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

CORRELATION AND REGRESSION

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Chapter 4. Regression Models. Learning Objectives

Regression Analysis II

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Correlation 1. December 4, HMS, 2017, v1.1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

determine whether or not this relationship is.

Regression Models. Chapter 4

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and Regression

Econometrics. 4) Statistical inference

Correlation and Linear Regression

STATS DOESN T SUCK! ~ CHAPTER 16

Intro to Linear Regression

CORRELATION AND SIMPLE REGRESSION 10.0 OBJECTIVES 10.1 INTRODUCTION

Correlation and Regression

Ch 13 & 14 - Regression Analysis

Linear Correlation and Regression Analysis

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 14 Simple Linear Regression (A)

Correlation and Regression

Describing the Relationship between Two Variables

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 10. Simple Linear Regression and Correlation

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

Predicted Y Scores. The symbol stands for a predicted Y score

A discussion on multiple regression models

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Correlation Analysis

Lesson 3 - Linear Functions

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Mathematics for Economics MA course

Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Chapter 12 - Lecture 2 Inferences about regression coefficient

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

REVIEW 8/2/2017 陈芳华东师大英语系

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

About Bivariate Correlations and Linear Regression

Simple Linear Regression Analysis

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Chapter 16. Simple Linear Regression and dcorrelation

Simple Linear Regression

Chapter 12 : Linear Correlation and Linear Regression

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Scatter plot of data from the study. Linear Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

CHAPTER 4 & 5 Linear Regression with One Regressor. Kazu Matsuda IBEC PHBU 430 Econometrics

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Module 7 Practice problem and Homework answers

Chapter 7: Correlation and regression

Chapter 4: Regression Models

BIOSTATISTICS NURS 3324

Basic Business Statistics 6 th Edition

Linear Regression and Correlation

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Intro to Linear Regression

Section 3: Simple Linear Regression

Psychology 282 Lecture #4 Outline Inferences in SLR

Inferences for Regression

Chapter 11 Linear Regression

Lectures 5 & 6: Hypothesis Testing

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

Ordinary Least Squares Regression Explained: Vartanian

Results and Analysis 10/4/2012. EE145L Lab 1, Linear Regression

STAT Chapter 11: Regression

Lecture 10: F -Tests, ANOVA and R 2

appstats27.notebook April 06, 2017

Regression Analysis. Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error EE290H F05

Chapter 16. Simple Linear Regression and Correlation

1 Correlation and Inference from Regression

Statistics Introductory Correlation

Data Analysis and Statistical Methods Statistics 651

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Correlation and Regression

Big Data Analysis with Apache Spark UC#BERKELEY

Ch 2: Simple Linear Regression

Chapter 27 Summary Inferences for Regression

Six Sigma Black Belt Study Guides

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Topic 10 - Linear Regression

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Transcription:

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable) there is a relationship between the independent variable, x (advertising), and the dependent variable, y (sales) Section 4 - Correlation (5)

correlation tells us if there is a relationship between variables, while regression tells us what type of relationship we have... is it positive or negative? is it linear or nonlinear? Section 4 - Correlation (6)

correlation tells us if there is a relationship between variables, while regression tells us what type of relationship we have... is it a simple or a multiple relationship? what is the relationship (form of the equation)? how good is the relationship? can we use it for predictive purposes? Section 4 - Correlation (7)

simple linear regressions straight lines, 1 independent variable can be extended to nonlinear and/or multiple regression when trying to determine whether there is a relationship between variables, start with a scatter plot: salary age salary age Section 4 - Correlation (8)

which variable is independent and which is dependent? in general, the independent variable can be controlled or manipulated and the dependent variable cannot sometimes it is unclear which is which... eg, tree diameter vs. volume ask which variable depends on the other? OR, which variable are you trying to predict? OR, which variable is easier to measure? OR, choose arbitrarily Section 4 - Correlation (9)

salary age first, determine what type of relationship it is next, come up with the regression line - the best fit of data recall that a line is described by y = mx + b where: m is the slope b is the y-intercept Section 4 - Correlation (30)

in regression, the line is described by: y = a + bx where: y is the predicted value a is the y-intercept b is the slope and to describe each point, y i, we must incorporate an error term, e: y i = a + bx + e i Section 4 - Correlation (31)

salary age Section 4 - Correlation (3)

given a set of data... age salary how do you find the regression line of best fit? the y-intercept (a) and slope (b) and the slope that best describes the observations y = a + bx Section 4 - Correlation (33)

it turns out that the best solution is the one that minimizes the sum of squares of the vertical distances between the line and each point... ie, n i= 1 e i is minimized rationale: the closer the line is to the points, the better the fit will be and the better the predictive power salary least squares solution age Section 4 - Correlation (34)

the least squares solution can be solved by: partial derivatives matrices (linear algebra) easy to use equations 1 make a table listing all of the x s, y s and the following: x y xy x y x 1 y 1 xy 1 x 1 y 1 x y xy x y x 3 y 3 xy 3 x 3 y 3 x n y n xy n x n y n Σx Σy Σxy Σx Σy sum each column and plug into y-intercept and slope formulae Section 4 - Correlation (35)

y-intercept, a: a = ( )( y x ) ( x)( xy) ( n x ) ( x) slope, b: b = n ( xy) ( x)( y) ( n x ) ( x) Section 4 - Correlation (36)

eg, the relationship between advertising budget and sales in a forest products firm (in $ 000 s): Ad (x) Sales (y) 4.6 87.1 5.1 93.1 4.8 89.8 4.4 91.4 5.9 99.5 4.7 9.1 5.1 95.5 5. 99.3 4.9 93.4 5.1 94.4 Scatter Plot 10 100 98 96 94 9 90 88 86 4 4.5 5 5.5 6 Advertising Budget (dollars) Section 4 - Correlation (37)

eg, the relationship between advertising budget and sales in a forest products firm (in $ 000 s): x y xy x y 4.6 Totals: 49.8 935.6 4671.1 49.54 87670.34 Section 4 - Correlation (38)

eg, the relationship between advertising budget and sales in a forest products firm (in $ 000 s): a = ( y)( x ) ( x)( xy) n ( x ) ( x) b = n ( xy) ( x)( y) n ( x ) ( x) Section 4 - Correlation (39)

eg, the relationship between advertising budget and sales in a forest products firm (in $ 000 s): regression line: Scatter Plot Sales (dollars) 10 100 98 96 94 9 90 88 86 4 4.5 5 5.5 6 Advertising Budget (dollars) Section 4 - Correlation (40)

how good is our prediction? in other words, how well does our line fit the data (note that the best fitting line may still have a lot of variation) 3 methods: 1correlation coefficient (r) coefficient of determination (r ) 3significance test Section 4 - Correlation (41)

1 correlation coefficient (r) check to see how well our variables are correlated in other words, how strong is the relationship between dependent and independent variables? many ways to measure this, but the most common is the Pearson Product Moment Correlation Coefficient (PPMC), which measures the strength and direction of a relationship between variables, given by: r = n ( xy) ( x)( y) ( x ) ( x) ( ) ( ( ) ( ) ) n n y y Section 4 - Correlation (4)

1 correlation coefficient (r) the correlation coefficient (r) will lie between -1 and 1-1 0 1 strong, negative relationship strong, positive relationship from our eg... Section 4 - Correlation (43)

coefficient of determination (r ) in order to understand r, we must analyze the regression line: y _ y x Section 4 - Correlation (44)

coefficient of determination (r ) this implies that total variation is made up of types of variation: explained variation it the variation explained by the regression line unexplained variation is due to random error ( y y) = ( y y) + ( y y ) ( n 1) = (1) + ( n ) σ = σ + total regression σ residual Section 4 - Correlation (45)

coefficient of determination (r ) computed as follows: r = SS explained SS variation total var iation 100% ( y y) ( y y ) r 100% = the coefficient of determination will always be: 0% r 100% Section 4 - Correlation (46)

coefficient of determination (r ) an easier method is to simply square the PPMC (correlation coefficient) meaning? 66.9% of the variation of the dependent variable is explained by (or accounted for by) the independent variable the regression line means that x and y are correlated, so that as x increases so to does y 66.9% of the variation in y is accounted for by the regression line Section 4 - Correlation (47)

coefficient of determination (r ) the higher the r, the better the predictive power of the regression line Section 4 - Correlation (48)

3 significance test we can also test the significance of the regression equation itself a regression is significant only if... σ > regression σ residual this requires an F-test, in which the regression is significant only if we reject the null hypothesis, H 0 : H 0 : σ σ regression residual = 1 H 1 : σ σ regression residual > 1 Section 4 - Correlation (49)

prediction: with regression, we are predicting y values from x values we are essentially inferring onto our dependant variable population and can build confidence intervals for this purpose to do so, we need to compute the standard error of estimate of the regression line s est ( y y ) = n working formula s est = y a y b n xy Section 4 - Correlation (50)

prediction: s est y a y b = n xy from our example, Section 4 - Correlation (51)

prediction: the standard error of estimate is a measure of variation of the observations around the regression line (the standard deviation of the y values [data] around the predicted valued [line] ) from our example, Section 4 - Correlation (5)

prediction: we use the standard error of estimate to build confidence intervals around the predictions to do this we need to calculate the standard error of a predicted y value : s y x 1 = sest + n n ( x x ) ( x ) i x i= 1 and construct the confidence interval: P y t α s y < y < y + tα s x y x ( n ) ( n ) = 1 α Section 4 - Correlation (53)

prediction: eg, what is the 95% confidence interval of sales with an advertising budget of $5,500? _ x = 4.98 SS x = 1.54 S est =.336 Section 4 - Correlation (54)

prediction: eg, what is the 95% confidence interval of sales with an advertising budget of $5,500? Section 4 - Correlation (55)