Correlation and Regression

Size: px
Start display at page:

Download "Correlation and Regression"

Transcription

1 Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1

2 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class 9 Slide 2

3 Example We are often interested in the association between two or more variables. Suppose the Midterm (X ) and Final (Y ) exam scores of a sample of n = 8 students are recorded and we wish to study the association between X and Y in the population of students. Midterm (X ) Final (Y ) We consider three approaches: (1) a graphical summary scatter plot (c.f., Class 3) (2) a numerical measure correlation coefficient (c.f., Class 3) (3) a model regression A SRS of independent observations STAT 151 Class 9 Slide 3

4 Scatter plot (1): Example Each observation (student) is represented by a symbol on the plot A scatter plot is useful for giving an overall impression of the kind of relationship between the variables, e.g., linear, nonlinear or no apparent relationship Final Midterm linear nonlinear none STAT 151 Class 9 Slide 4

5 Scatter plot (2) Outliers are observations that deviate from the general trend of the rest of the data If we have a new observation (X, Y ) = (99, 10), it will appear as the red open circle The scatter plot shows the new observation is unusual Scatter plots are generally not useful when there are more than two variables, e.g., Projects, Midterm, Final, etc. Final Midterm STAT 151 Class 9 Slide 5

6 Pearson correlation (Egon Sharp Pearson, ) In Class 3, cov(x, Y ) is used to measure association between X and Y : X X Y (Final) Y Y 20 cov(x,y)= cov(x,y)= X (Midterm) X (Midterm) cov(x, Y ) is not invariant to scale transformation, e.g., its value changes if midterm is recorded as (0,10) instead of (0,100) The sign of cov(x, Y ) (+ vs. -) can be used to tell direction of the association, but its magnitude has no meaning STAT 151 Class 9 Slide 6

7 Pearson correlation (Egon Sharp Pearson, ) A Pearson (product moment) correlation coefficient, r corr(x, Y ), is a number that summarizes the linear relationship between X and Y For X from a population with mean µ X and variance σ 2 X, a Z-score: Z X = X µ X σ X tells us X relative to the rest of the population ( ) X µx Y µ Y r = }{{} E (Z X Z Y ) = E = E(X µ X )(Y µ Y ) = cov(x, Y ) σ X σ Y σ X σ Y σ X σ Y average measures, on average, whether X and Y are in tandem relative to their populations Using n observations (X 1, Y 1 ),..., (X n, Y n ) (Xi X )(Y i Ȳ ) r = n 1 (Xi X ) 2 (Yi Ȳ ) 2 = (Xi X )(Y i Ȳ ) (Xi X ) 2 (Yi Ȳ ) 2 n 1 n 1 STAT 151 Class 9 Slide 7

8 Correlation: Example For calculation, the equivalent formula is more convenient: r = Xi Y i n X Ȳ n 1 X 2 i n X 2 n 1 Y 2 i nȳ 2 n 1 = Xi Y i n X Ȳ X 2 i n X 2 Y 2 i nȳ 2 X recorded as (0,100) X = , Ȳ = 65, 8 i=1 X iy i = i=1 X i 2 = 38493, 8 i=1 Y i 2 = r = = (67.375)(65) (67.375) (65) X recorded as (0,10) X = , Ȳ = 65, 8 i=1 X iy i = i=1 X i 2 = , 8 i=1 Y i 2 = r = = (6.7375)(65) (6.7375) (65) On average, Z X Z Y = > 0 Z X and Z Y are of the same sign (both + or both ) they are either both big or both small relative to their own populations STAT 151 Class 9 Slide 8

9 Sample correlation under various relationships (Fig. 3) 1 r 1 The magnitude of r measures the strength of the association. If r 1, the association is strong (B, C and D); if r 0, the association is weak (A) or non-linear The sign of r measures the direction of the association. If r > 0, large X tends to be associated with large Y (B and C); if r < 0, large X tends to be associated with small Y (D) (A) r = X 0.8 (C) r = X 0.8 Y Y (B) r = X 0.8 (D) r = X 0.8 Y Y STAT 151 Class 9 Slide 9

10 Correlation measures linear relationships (Fig. 4) A B r measures linear associations (A) A non-linear relationship may distort the value of r (B) Outliers may distort the value of r (C) A restrictive range (open circles) in X or Y may lead to a smaller r (D) C D STAT 151 Class 9 Slide 10

11 Prediction under a linear model (Fig. 5) A regression analysis allows us to determine if Midterm score (X ) can be used to predict Final score (Y ). The scatter plot suggests there may be a linear relationship between X and Y (i.e., each additional point in the Midterm is associated with b extra points in the Final). Final A regression analysis uses a Midterm sample of students to determine whether a linear relationship exists for the population of students. STAT 151 Class 9 Slide 11

12 Simple linear regression We postulate that the relationship between Midterm score (X ) and Final score (Y ) in the population be represented by a straight line: Y = a + bx where a is the intercept and b is the slope. The variable X is called an independent or predictor variable and Y is called a dependent or outcome variable. A simple linear regression is a regression with only one predictor and the relationship between the predictor and the outcome variable is assumed to be linear. The intercept a gives the prediction of Y when X = 0 or b = 0. Often a is not of interest or may even be meaningless, e.g., if X represents the height of a person and Y represents the weight, then no person has a height (X ) of zero. The value of b is the change in Y for every unit difference in X. Figure 5 shows that the observations do not fall on the straight line. In fact, there is no straight line that fits all observations. We assume Y = a + bx + e, e N(0, σ 2 ) STAT 151 Class 9 Slide 12

13 Simple linear regression (2) Y = a } + {{ bx } + }{{} e, e N(0, σ 2 ). (A) (B) (A) a + bx is the average value of Y for observations with a particular value of X (B) Each observation Y differs from the average by an amount e, and e N(0, σ 2 ) (A)+(B) For each known value of X, the values of Y N(a + bx, σ 2 ). Therefore, in a regression, we assume we have known values of X at X 1,..., X n and we investigate how Y changes at these values, which is captured by the regression model We use maximum likelihood estimation (MLE), which is equivalent to a method called ordinary least squares (OLS) in this setting STAT 151 Class 9 Slide 13

14 Maximum Likelihood (1) Data Midterm (X ) Final (Y ) a + b(55) a + b(60) a + b(65) STAT 151 Class 9 Slide 14

15 Maximum Likelihood (2) We have a sample Y 1,..., Y n at X 1,..., X n, respectively. Assuming Y i N(a + bx i, σ 2 ), where a, b, σ 2 are unknown, we can find the MLE of these parameters. The MLEs are a, b, σ 2 that jointly maximize the likelihood L(a, b, σ 2 ) = n i (a + bx i )} 2 1 e {Y 2σ 2 2πσ 2 Taking (natural) logarithm of L(a, b, σ 2 ) gives the log-likelihood i=1 i=1 n i (a + bx i )} 2 l(a, b, σ 2 1 n ) = log e {Y 2σ 2 = [ {Y i (a + bx i )} 2 2πσ 2 2σ 2 The MLEs are found by l(â, ˆb, ˆσ 2 ) l(â, ˆb, ˆσ 2 ) = 0, = 0, a b ˆb = i=1 (X i X )(Y i Ȳ ) i=1 (X i X ) 2 = â = Ȳ ˆb X, ˆσ 2 = 1 n STAT 151 Class 9 Slide 15 i=1 X 2 i=1 l(â, ˆb, ˆσ 2 ) σ 2 = 0 X iy i n X Ȳ i=1 i n( X ) = cov(x, Y ), 2 var(x ) i=1 {Y i (â + ˆbX i )} 2 1 ] log2π logσ 2

16 Least squares For any value of σ 2 in the log-likelihood l(a, b, σ 2 ) = n [ {Y i (a + bx i )} 2 i=1 l(a, b, σ 2 ) is maximized if 2σ 2 1 ] log2π logσ 2 n {Y i (a + bx i )} 2 i=1 is minimized (hence least squares ). The best fitting line using MLE or OLS is the line that minimizes the sum of squared deviations of the observations from the line STAT 151 Class 9 Slide 16 Final Midterm

17 Example Using our sample of n = 8 students, what is the predicted Final score for a student who scored 65 on the Midterm using the MLE (OLS) estimates? ˆb = (67.375)(65) (67.375) 2 = 0.59, â = (67.375) = The fitted regression line is Final = Midterm For a student whose Midterm score is 65, her predicted Final score is = STAT 151 Class 9 Slide 17

18 Quality of the regression - Residual plots Under the regression model Y i = a + bx i + e i e i N(0, σ 2 ) ê i = Y i Ŷ i = Y i (â + ˆbX i ) (a) Random residuals (c) Skewed distribution X If the model is correct, ê i s should resemble a set of random observations from a normal distribution with mean zero like panel (a) STAT 151 Class 9 Slide (b) Non linear (d) Non constant varinace

19 Residual plot - Example Based on the regression model Ŷ = X ê i = Y i Ŷ i = Y i ( X i ) Y i Ŷ i ê i residuals X 0 STAT 151 Class 9 Slide 19

20 Notes about a regression analysis A linear regression model makes 3 assumptions: 1. The relationship between X and Y is linear, i.e., Y = a + bx + e 2. The values of Y i s are normally distributed about the regression line 3. The variances of Y i s about the regression line are the same The regression line is fitted by MLE (= OLS), which means the sum of the squared distances of the observations to the regression line is minimized Prediction can only be made in the range of X used to obtain the regression line. In the example, since the lowest and the highest Midterm scores in the 8 students are 35 and 92, therefore, prediction can be made for other students who Midterm scores are within this range. For someone whose Midterm score falls outside (35,92), no prediction is possible. This restriction does not apply to the dependent variable, so the predicted Final score can be outside the range of Y values observed in the 8 students STAT 151 Class 9 Slide 20

21 Observed relationship Fact or Fiction? ˆb â {}}{{}}{ Final = Midterm shows each additional point in the Midterm is associated with an extra 0.59 point in the Final for the 8 students. Our estimate ˆb comes from a sample and hence there is sampling error, i.e., ˆb b Does the association generalise to the population of students? Two approaches to answering this question: (1) Test the hypotheses: H 0 : b = 0 (no relationship) vs. H 1 : b 0 (some relationship) (2) Find an interval estimate: ˆb ± margin of error of ˆb STAT 151 Class 9 Slide 21

22 Hypothesis testing For a sample of students such that midterm (X ) and final (Y ) are unrelated: (1) ˆb is expected to be zero (2) sampling variation allows ˆb 0 but it is unlikely to be far from 0 5% unexpected 0 critical value expected Value of ˆb unexpected We use a test statistic to determine whether ˆb for our sample is far from 0: z = our sample {}}{ ˆb X and Y unrelated {}}{ 0 var(ˆb) }{{} allowance for sampling variation = var(ˆb) STAT 151 Class 9 Slide 22

23 Hypothesis testing (2) estimating var(ˆb) var(ˆb) = var Earlier, we learned var(ˆb) = i=1 (X i X )(Y i Ȳ ) i=1 (X i X ) 2 = var (X i X ) 2 var(y i ) i=1 [ (X i X ] 2 = (X i X ) 2 σ 2 i=1 [ ) 2 (X i X ] 2 = ) 2 i=1 i=1 where σ 2 can be estimated using the MLE (X i X )Y i=1 i (X i X ) 2 i=1 σ 2 i=1 (X i X ) 2 ˆσ 2 = i=1 {Y i (â + ˆbX i )} 2 n = i=1 (Y i Ŷi) 2 n (X i X )(Y i Ȳ ) = (X i X )Y i (X i X )Ȳ = (X i X )Y i Ȳ X 1,..., X n are assumed known and hence constants =0 {}}{ (Xi X ) Sometimes, the denominator of ˆσ 2 uses n 2 to give an unbiased estimator for σ 2 STAT 151 Class 9 Slide 23

24 Hypothesis testing (3) For large n, we find: z = ˆb 0 var(ˆb) ˆb 0 = n ˆσ/ i=1 X i 2 n( X ) = / (67.375) 2 = > 1.96 For small n, we replace the critical value of 1.96 by a new critical value that depends on the degree of freedom (df ), defined as df = n 2. Critical values for selected df s are given below: df = n >120 critical value In our study, df = 8 2 = 6, the critical value is Since z > 2.447, therefore, we arrive at the same conclusion of rejecting H 0 : b = 0. We are rarely interested in a one-sided test of b. STAT 151 Class 9 Slide 24

25 95% Confidence and prediction intervals Parameter MLE (OLS) 95% confidence interval Slope b ˆb ˆb ± 1.96 SD(ˆb) = ˆb ± 1.96ˆσ 1 i=1 X 2 i n( X ) 2 Average value â + ˆbX â + ˆbX ± 1.96SD(â + ˆbX ) of Y given X = â + ˆbX 1 ± 1.96ˆσ (a + bx ) 0 n + (X X ) 2 n i=1 X i 2 n( X ) 2 {}}{ Individual value â + ˆbX + ê â + ˆbX ± 1.96SD(â + ˆbX + ê) of Y given X = â + ˆbX ± 1.96ˆσ n + (X X ) 2 n i=1 X i 2 n( X ) 2 (a + bx + e) For small values of n, 1.96 can be replaced by an appropriate value in the t-table â + ˆbX = (Ȳ ˆb X ) + ˆbX = Ȳ + ˆb(X X ) Also called a prediction interval STAT 151 Class 9 Slide 25

26 Example Final Prediction Confidence Midterm STAT 151 Class 9 Slide 26

27 Goodness-of-fit: R 2 How well does the model fit the data? We answer this question using a Goodness-of-fit measure called the coefficient of determination R 2 ( R-square ). R 2 can be justified as follows. Consider using n observations (X 1, Y 1 ),..., (X n, Y n ) of (X, Y ) to predict the next observation, Y n+1 of Y. Two possible estimates are: (1) Ȳ = 1 n i=1 Y i and (2) Ŷ i = â + ˆbX i How do they compare? Since Y n+1 is unknown, we cannot tell whether Ȳ and Ŷi is closer to Y n+1. However, we can compare their performances in predicting the observed Y i, i = 1,..., n. For Y i, the error incurred by these estimates are: (Y i Ȳ ) and (Y i Ŷ i ) R 2 is then defined as Total error using Ȳ Total error using Ŷ i Total error using Ȳ = i=1 (Y i Ȳ ) 2 i=1 (Y i Ȳ ) 2 i=1 (Y i Ŷ i ) 2 STAT 151 Class 9 Slide 27

28 R 2 SSE R 2 = SST SSE { }}{{ n (Y }}{ i Ȳ ) 2 n (Y i Ŷ i ) 2 i=1 i=1 i=1 (Y i Ȳ )2 SST Final Errors Final Errors Midterm Midterm SSE is defined as the sum of the errors whereas SST is defined as the sum of the errors; SSE SST since SSE is total errors from the least squares line STAT 151 Class 9 Slide 28

29 Example For a simple linear regression model, a simple relationship exists between R 2 and r: R 2 = corr(x, Y ) 2 = r 2 = = in our example between Midterm and Final score, so the error is reduced by about half compared to without the model. Multiplying R 2 by 100% gives the percent variation explained R 2 100% = 50.7%, which tells us that about 50.7% of the differences in Final score between students can be accounted for by their Midterm score; while the remaining differences, i.e., 49.3% are due to other (unknown) factors. When there is more than one predictor, r cannot be calculated; in that case, R 2 gives the correlation between the outcome and the predictors STAT 151 Class 9 Slide 29

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

STAT FINAL EXAM

STAT FINAL EXAM STAT101 2013 FINAL EXAM This exam is 2 hours long. It is closed book but you can use an A-4 size cheat sheet. There are 10 questions. Questions are not of equal weight. You may need a calculator for some

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple Linear Regression

Simple Linear Regression 9-1 l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical Method for Determining Regression 9.4 Least Square Method 9.5 Correlation Coefficient and Coefficient

More information

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis Multiple Regression Analysis Basic Estimation Techniques Herbert Stocker herbert.stocker@uibk.ac.at University of Innsbruck & IIS, University of Ramkhamhaeng Regression Analysis: Statistical procedure

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Stat 101: Lecture 6. Summer 2006

Stat 101: Lecture 6. Summer 2006 Stat 101: Lecture 6 Summer 2006 Outline Review and Questions Example for regression Transformations, Extrapolations, and Residual Review Mathematical model for regression Each point (X i, Y i ) in the

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Econometrics I Lecture 3: The Simple Linear Regression Model

Econometrics I Lecture 3: The Simple Linear Regression Model Econometrics I Lecture 3: The Simple Linear Regression Model Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 32 Outline Introduction Estimating

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 3: Inferences About Means Sample of Means: number of observations in one sample the population mean (theoretical mean) sample mean (observed mean) is the theoretical standard deviation of the population

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Christopher Ting Christopher Ting : christophert@smu.edu.sg : 688 0364 : LKCSB 5036 January 7, 017 Web Site: http://www.mysmu.edu/faculty/christophert/ Christopher Ting QF 30 Week

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Semester 2, 2015/2016

Semester 2, 2015/2016 ECN 3202 APPLIED ECONOMETRICS 2. Simple linear regression B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 PREDICTION The true value of y when x takes some particular

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

Chapter 7. Scatterplots, Association, and Correlation

Chapter 7. Scatterplots, Association, and Correlation Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Ch Inference for Linear Regression

Ch Inference for Linear Regression Ch. 12-1 Inference for Linear Regression ACT = 6.71 + 5.17(GPA) For every increase of 1 in GPA, we predict the ACT score to increase by 5.17. population regression line β (true slope) μ y = α + βx mean

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean Interval estimation October 3, 2018 STAT 151 Class 7 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0

More information

Chapter 5 Friday, May 21st

Chapter 5 Friday, May 21st Chapter 5 Friday, May 21 st Overview In this Chapter we will see three different methods we can use to describe a relationship between two quantitative variables. These methods are: Scatterplot Correlation

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

The Multinomial Model

The Multinomial Model The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

s e, which is large when errors are large and small Linear regression model

s e, which is large when errors are large and small Linear regression model Linear regression model we assume that two quantitative variables, x and y, are linearly related; that is, the the entire population of (x, y) pairs are related by an ideal population regression line y

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Steps to take to do the descriptive part of regression analysis:

Steps to take to do the descriptive part of regression analysis: STA 2023 Simple Linear Regression: Least Squares Model Steps to take to do the descriptive part of regression analysis: A. Plot the data on a scatter plot. Describe patterns: 1. Is there a strong, moderate,

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. 12er12 Chapte Bivariate i Regression (Part 1) Bivariate Regression Visual Displays Begin the analysis of bivariate data (i.e., two variables) with a scatter plot. A scatter plot - displays each observed

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information