Estadística II Chapter 5. Regression analysis (second part)

Size: px
Start display at page:

Download "Estadística II Chapter 5. Regression analysis (second part)"

Transcription

1 Estadística II Chapter 5. Regression analysis (second part)

2 Chapter 5. Regression analysis (second part) Contents Diagnostic: Residual analysis The ANOVA (ANalysis Of VAriance) decomposition Nonlinear relationships and linearizing transformations Simple linear regression model in the matrix notation Introduction to the multiple linear regression

3 Lesson 5. Regression analysis (second part) Bibliography references Newbold, P. Estadística para los negocios y la economía (1997) Chapters 12, 13 and 14 Peña, D. Regresión y Análisis de Experimentos (2002) Chapters 5 and 6

4 Regression diagnostics Diagnostics The theoretical assumptions for the simple linear regression model with one response variable y and one explicative variable x are: Linearity: y i = β 0 + β 1 x i + u i, for i = 1,..., n Homogeneity: E[u i ] = 0, for i = 1,..., n Homoscedasticity: Var[ui ] = σ 2, for i = 1,..., n Independence: ui and u j are independent for i j Normality: ui N(0, σ 2 ), for i = 1,..., n We study how to apply diagnostic procedures to test if these assumptions are appropriate for the available data (x i, y i ) Based on the analysis of the residuals ei = y i ŷ i

5 Diagnostics: scatterplots Scatterplots The simplest diagnostic procedure is based on the visual examination of the scatterplot for (x i, y i ) Often this simple but powerful method reveals patterns suggesting whether the theoretical model might be appropriate or not We illustrate its application on a classical example. Consider the four following datasets

6 Diagnostics: scatterplots The Anscombe datasets TABLE 3-10 Four Data Sets DATASET 1 DATASET 2 X Y X Y ' DATASET 3 DATASET 4 X Y X Y SOURCE: F. J. Anscombe, op. cit.

7 Diagnostics: scatterplots The Anscombe example The estimated regression model for each of the four previous datasets is the same ŷ i = x i n = 11, x = 9.0, ȳ = 7.5, rxy = The estimated standard error of the estimator ˆβ 1, s 2 R (n 1)s 2 x takes the value in all four cases. The corresponding T statistic takes the value T = 0.5/0.118 = But the corresponding scatterplots show that the four datasets are quite different. Which conclusions could we reach from these diagrams?

8 Diagnostics: scatterplots Anscombe data scatterplots o Data Set 1 Data Set Data Set 3 Data Set FIGURE 3-29 Scatterplots for the four data sets of Table 3-10 SOURCE: F. J. Anscombe, op cit.

9 Residual analysis Further analysis of the residuals If the observation of the scatterplot is not sufficient to reject the model, a further step would be to use diagnostic methods based on the analysis of the residuals e i = y i ŷ i This analysis starts by standarizing the residuals, that is, dividing them by the residuals (quasi-)standard deviation s R. The resulting quantities are known as standardized esiduals: e i s R Under the assumptions of the linear regression model, the standardized residuals are approximately independent standard normal random variables A plot of these standardized residuals should show no clear pattern

10 Residual analysis Residual plots Several types of residual plots can be constructed. The most common ones are: Plot of standardized residuals vs. x Plot of standardized residuals vs. ŷ (the predicted responses) Deviations from the model hypotheses result in patterns on these plots, which should be visually recognizable

11 Residual plots examples Consistency 5.1: Ej: of the consistencia theoretical model con el modelo teórico

12 Residual plots examples Nonlinearity 5.1: Ej: No linealidad

13 Residual plots examples Heteroscedasticity 5.1: Ej: Heterocedasticidad

14 Residual analysis Outliers In a plot of the regression line we may observe outlier data, that is, data that show significant deviations from the regression line (or from the remaining data) The parameter estimators for the regression model, ˆβ 0 and ˆβ 1, are very sensitive to these outliers It is important to identify the outliers and make sure that they really are valid data

15 Residual analysis Normality of the errors One of the theoretical assumptions of the linear regression model is that the errors follow a normal distribution This assumption can be checked visually from the analysis of the residuals e i, using different approaches: By inspection of the frequency histogram for the residuals By inspection of the Normal Probability Plot of the residuals (significant deviations of the data from the straight line in the plot correspond to significant departures from the normality assumption)

16 The ANOVA decomposition Introduction ANOVA: ANalysis Of VAriance When fitting the simple linear regression model ŷ i = ˆβ 0 + ˆβ 1 x i to a data set (x i, y i ) for i = 1,..., n, we may identify three sources of variability in the responses variability associated to the model: SSM = n i=1 (ŷ i ȳ) 2, where the initials SS denote sum of squares and M refers to the model variability of the residuals: SSR = n i=1 (y i ŷ i ) 2 = n i=1 e2 i total variability: SST = n i=1 (y i ȳ) 2 The ANOVA decomposition states that: SST = SSM + SSR

17 The ANOVA decomposition The coefficient of determination R 2 The ANOVA decomposition states that SST = SSM + SSR Note that y i ȳ = (y i ŷ i ) + (ŷ i ȳ) SSM = n i=1 (ŷ i ȳ) 2 measures the variation in the responses due to the regression model (explained by the predicted values ŷ i ) Thus, the ratio SSR/SST is the proportion of the variation in the responses that is not explained by the regression model The ratio R 2 = SSM/SST = 1 SSR/SST is the proportion of the variation in the responses that is explained by the regression model. It is known as the coefficient of determination The value of the coefficient of determination satisfies R 2 = r 2 xy (the squared correlation coefficient) For example, if R 2 = 0.85 the variable x explains 85% of the variation in the response variable y

18 The ANOVA decomposition ANOVA table Source of variability SS DF Mean F ratio Model SSM 1 SSM/1 SSM/s 2 R Residuals/errors SSR n 2 SSR/(n 2) = s 2 R Total SST n 1

19 The ANOVA decomposition ANOVA hypothesis testing Hypothesis test, H 0 : β 1 = 0 vs. H 1 : β 1 0 Consider the ratio F = SSM/1 SSR/(n 2) = SSM sr 2 Under H 0, F follows an F 1,n 2 distribution Test at a significance level α: reject H 0 if F > F 1,n 2;α How does this result relate to the test based on the Student-t we saw in Lesson 4? They are equivalent

20 The ANOVA decomposition Excel output

21 Nonlinear relationships and linearizing transformations Introduction Consider the case when the deterministic part f (x i ; a, b) of the response in the model y i = f (x i ; a, b) + u i, i = 1,..., n is a nonlinear function of x that depends on two parameters a and b (for example, f (x; a, b) = ab x ) In some cases we may apply transformations to the data to linearize them. We are then able to apply the linear regression procedure From the original data (x i, y i ) we obtain the transformed data (x i, y i ) The parameters β 0 and β 1 corresponding to the linear relation between x i and y i are transformations of the parameters a and b

22 Nonlinear relationships and linearizing transformations Linearizing transformations Examples of linearizing transformations: If y = f (x; a, b) = ax b then log y = log a + b log x. We have y = log y, x = log x, β 0 = log a, β 1 = b If y = f (x; a, b) = ab x then log y = log a + (log b)x. We have y = log y, x = x, β 0 = log a, β 1 = log b If y = f (x; a, b) = 1/(a + bx) then 1/y = a + bx. We have y = 1/y, x = x, β 0 = a, β 1 = b If y = f (x; a, b) = log(ax b ) then y = log a + b(log x). We have y = y, x = log x, β 0 = log a, β 1 = b

23 A matrix treatment of linear regression Introduction Remember the simple linear regression model, y i = β 0 + β 1 x i + u i, i = 1,..., n If we write one equation for each one of the observations, we have y 1 = β 0 + β 1 x 1 + u 1 y 2 = β 0 + β 1 x 2 + u 2.. y n = β 0 + β 1 x n + u n

24 A matrix treatment of linear regression The model in matrix form We can write the preceding equations in matrix form as y 1 β 0 + β 1 x 1 u 1 y 2 β 0 + β 1 x 2 u 2. y n =. + β 0 + β 1 x n. u n And splitting the parameters β from the variables x i, y 1 1 x 1 u 1 y 2 1 x 2 ( ) β0 u 2. y n =.. 1 x n β 1 +. u n

25 A matrix treatment of linear regression The regression model in matrix form We can write the preceding matrix relationship y 1 1 x 1 y 2 1 x 2 ( ) β0 as. y n =.. 1 x n β 1 y = Xβ + u + y: response vector; X: explanatory variables matrix (or experimental design matrix); β: vector of parameters; u: error vector u 1 u 2. u n

26 The regression model in matrix form Covariance matrix for the errors We denote as Cov(u) the n n matrix of covariances for the errors. Its (i, j)-th element is given by cov(u i, u j ) = 0 if i j and cov(u i, u i ) = Var(u i ) = σ 2 Thus, Cov(u) is the identity matrix I n n multiplied by σ 2 : σ σ 2 0 Cov(u) = = σ2 I 0 0 σ 2

27 The regression model in matrix form Least-squares estimation The least-squares vector parameter estimate ˆβ is the unique solution of the 2 2 matrix equation (check the dimensions) (X T X) ˆβ = X T y, that is, ˆβ = (X T X) 1 X T y. The vector ŷ = (ŷ i ) of response estimates is given by ŷ = X ˆβ and the residual vector is defined as e = y ŷ

28 The multiple linear regression model Introduction Use of the simple linear regression model: predict the value of a response y from the value of an explanatory variable x In many applications we wish to predict the response y from the values of several explanatory variables x 1,..., x k For example: forecast the value of a house as a function of its size, location, layout and number of bathrooms forecast the size of a parliament as a function of the population, its rate of growth, the number of political parties with parliamentary representation, etc.

29 The multiple linear regression model The model Use of the multiple linear regression model: predict a response y from several explanatory variables x 1,..., x k If we have n observations, for i = 1,..., n, y i = β 0 + β 1 x i1 + β 2 x i2 + + β k x ik + u i We assume that the error variables u i are independent random variables following a N(0, σ 2 ) distribution

30 The multiple linear regression model The least-squares fit We have n observations, and for i = 1,..., n y i = β 0 + β 1 x i1 + β 2 x i2 + + β k x ik + u i We wish to fit to the data (x i1, x i2,..., x ik, y i ) a hyperplane of the form ŷ i = ˆβ 0 + ˆβ 1 x i1 + ˆβ 2 x i2 + + ˆβ k x ik The residual for observation i is defined as e i = y i ŷ i We obtain the parameter estimates ˆβ j as the values that minimize the sum of the squares of the residuals

31 The multiple linear regression model The model in matrix form We can write the model as a matrix relationship, β y 1 1 x 11 x 12 x 0 1k y 2. = 1 x 21 x 22 x 2k β β y n 1 x n1 x n2 x nk and in compact form as y = Xβ + u y: response vector; X: explanatory variables matrix (or experimental design matrix); β: vector of parameters; u: error vector β k u 1 u 2. u n

32 The multiple linear regression model Least-squares estimation The least-squares vector parameter estimate ˆβ is the unique solution of the (k + 1) (k + 1) matrix equation (check the dimensions) (X T X) ˆβ = X T y, and as in the k = 1 case (simple linear regression) we have ˆβ = (X T X) 1 X T y. The vector ŷ = (ŷ i ) of response estimates is given by ŷ = X ˆβ and the residual vector is defined as e = y ŷ

33 The multiple linear regression model Variance estimation For the multiple linear regression model, an estimator for the error variance σ 2 is the residual (quasi-)variance, s 2 R = n i=1 e2 i n k 1, and this estimator is unbiased Note that for the simple linear regression case we had n 2 in the denominator

34 The multiple linear regression model The sampling distribution of ˆβ Under the model assumptions, the least-squares estimator ˆβ for the parameter vector β follows a multivariate normal distribution E( ˆβ) = β (it is an unbiased estimator) The covariance matrix for ˆβ is Cov( ˆβ) = σ 2 (X T X) 1 We estimate Cov( ˆβ) using s 2 R (XT X) 1 The estimate of Cov( ˆβ) provides estimates s 2 ( ˆβ j ) for the variance Var( ˆβ j ). s( ˆβ j ) is the standard error of the estimator ˆβ j If we standardize ˆβ j we have ˆβ j β j s( ˆβ j ) t n k 1 (the Student-t distribution)

35 The multiple linear regression model Inference on the parameters β j Confidence interval for the partial coefficient β j at a confidence level 1 α ˆβ j ± t n k 1;α/2 s( ˆβ j ) Hypothesis testing for H 0 : β j = 0 vs. H 1 : β j 0 at a confidence level α Reject H0 if T > t n k 1;α/2, where is the test statistic T = ˆβ j s( ˆβ j )

36 The ANOVA decomposition The multivariate case ANOVA: ANalysis Of VAriance When fitting the multiple linear regression model ŷ i = ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik to a data set (x i1,..., x ik, y i ) for i = 1,..., n, we may identify three sources of variability in the responses variability associated to the model: SSM = n i=1 (ŷ i ȳ) 2, where the initials SS denote sum of squares and M refers to the model variability of the residuals: SSR = n i=1 (y i ŷ i ) 2 = n i=1 e2 i total variability: SST = n i=1 (y i ȳ) 2 The ANOVA decomposition states that: SST = SSM + SSR

37 The ANOVA decomposition The coefficient of determination R 2 The ANOVA decomposition states that SST = SSM + SSR Note that y i ȳ = (y i ŷ i ) + (ŷ i ȳ) SSM = n i=1 (ŷ i ȳ) 2 measures the variation in the responses due to the regression model (explained by the predicted values ŷ i ) Thus, the ratio SSR/SST is the proportion of the variation in the responses that is not explained by the regression model The ratio R 2 = SSM/SST = 1 SSR/SST is the proportion of the variation in the responses that is explained by the explanatory variables. It is known as the coefficient of multiple determination For example, if R 2 = 0.85 the variables x 1,..., x k explain 85% of the variation in the response variable y

38 The ANOVA decomposition ANOVA table Source of variability SS DF Mean F ratio Model SSM k SSM/k (SSM/k)/ Residuals/errors SSR n k 1 Total SST n 1 SSR/(n k 1) = sr 2

39 The ANOVA decomposition ANOVA hypothesis testing Hypothesis test, H 0 : β 1 = β 2 = = β k = 0 vs. H 1 : β j 0 for some j = 1,..., k H 0 : the response does not depend on any x j H 1 : the response depends on at least one x j ˆβ j interpretation: keeping everything else the same, a one unit increase in x j will produce, on average, a change by ˆβ j in y Consider the ratio F = SSM/k SSR/(n k 1) = SSM sr 2 Under H 0, F follows an F k,n k 1 distribution Test at a significance level α: reject H 0 if F > F k,n k 1;α

Estadística II Chapter 4: Simple linear regression

Estadística II Chapter 4: Simple linear regression Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties

More information

Statistics II Exercises Chapter 5

Statistics II Exercises Chapter 5 Statistics II Exercises Chapter 5 1. Consider the four datasets provided in the transparencies for Chapter 5 (section 5.1) (a) Check that all four datasets generate exactly the same LS linear regression

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Diagnostics of Linear Regression

Diagnostics of Linear Regression Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

The Multiple Regression Model

The Multiple Regression Model Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Simple Linear Regression: The Model

Simple Linear Regression: The Model Simple Linear Regression: The Model task: quantifying the effect of change X in X on Y, with some constant β 1 : Y = β 1 X, linear relationship between X and Y, however, relationship subject to a random

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

SIMPLE REGRESSION ANALYSIS. Business Statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

The Simple Regression Model. Simple Regression Model 1

The Simple Regression Model. Simple Regression Model 1 The Simple Regression Model Simple Regression Model 1 Simple regression model: Objectives Given the model: - where y is earnings and x years of education - Or y is sales and x is spending in advertising

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Statistical Inference. Part IV. Statistical Inference

Statistical Inference. Part IV. Statistical Inference Part IV Statistical Inference As of Oct 5, 2017 Sampling Distributions of the OLS Estimator 1 Statistical Inference Sampling Distributions of the OLS Estimator Testing Against One-Sided Alternatives Two-Sided

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

UNIT 12 ~ More About Regression

UNIT 12 ~ More About Regression ***SECTION 15.1*** The Regression Model When a scatterplot shows a relationship between a variable x and a y, we can use the fitted to the data to predict y for a given value of x. Now we want to do tests

More information

Multiple Regression Analysis: Heteroskedasticity

Multiple Regression Analysis: Heteroskedasticity Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 ) Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

Bivariate data analysis

Bivariate data analysis Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal Econ 3790: Statistics Business and Economics Instructor: Yogesh Uppal Email: yuppal@ysu.edu Chapter 14 Covariance and Simple Correlation Coefficient Simple Linear Regression Covariance Covariance between

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

Correlation 1. December 4, HMS, 2017, v1.1

Correlation 1. December 4, HMS, 2017, v1.1 Correlation 1 December 4, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 7 Navidi, Chapter 7 I don t expect you to learn the proofs what will follow. Chapter References 2 Correlation The sample

More information

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Analisi Statistica per le Imprese

Analisi Statistica per le Imprese , Analisi Statistica per le Imprese Dip. di Economia Politica e Statistica 4.3. 1 / 33 You should be able to:, Underst model building using multiple regression analysis Apply multiple regression analysis

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511 STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not

More information

Correlation in Linear Regression

Correlation in Linear Regression Vrije Universiteit Amsterdam Research Paper Correlation in Linear Regression Author: Yura Perugachi-Diaz Student nr.: 2566305 Supervisor: Dr. Bartek Knapik May 29, 2017 Faculty of Sciences Research Paper

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

Statistics II Lesson 1. Inference on one population. Year 2009/10

Statistics II Lesson 1. Inference on one population. Year 2009/10 Statistics II Lesson 1. Inference on one population Year 2009/10 Lesson 1. Inference on one population Contents Introduction to inference Point estimators The estimation of the mean and variance Estimating

More information