Analisi Statistica per le Imprese

Similar documents
Chapter 7 Student Lecture Notes 7-1

Chapter 14 Student Lecture Notes 14-1

Basic Business Statistics, 10/e

The Multiple Regression Model

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 3 Multiple Regression Complete Example

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Ch 3: Multiple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Lecture 3: Multiple Regression

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Linear Regression Models

The regression model with one fixed regressor cont d

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

ECON 4230 Intermediate Econometric Theory Exam

Introduction to Estimation Methods for Time Series models. Lecture 1

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Empirical Economic Research, Part II

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Lecture 1: OLS derivations and inference

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Ch 2: Simple Linear Regression

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Finding Relationships Among Variables

Econometrics. 5) Dummy variables

Chapter 4: Regression Models

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Econometrics of Panel Data

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Multiple Regression Analysis

Simple Linear Regression

Lecture 4: Heteroskedasticity

LECTURE 5 HYPOTHESIS TESTING

Simple Linear Regression

14 Multiple Linear Regression

Chapter 4. Regression Models. Learning Objectives

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Homoskedasticity. Var (u X) = σ 2. (23)

4 Multiple Linear Regression

Econometrics Homework 1

Estadística II Chapter 5. Regression analysis (second part)

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Regression of Time Series

Intermediate Econometrics

The Standard Linear Model: Hypothesis Testing

INTRODUCTORY ECONOMETRICS

Inferences for Regression

Statistics II Exercises Chapter 5

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 4 Suggested Solution

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Panel data can be defined as data that are collected as a cross section but then they are observed periodically.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Bivariate Relationships Between Variables

2 Regression Analysis

Correlation Analysis

Introduction to Linear Regression Analysis

Least Squares Estimation-Finite-Sample Properties

Microeconometria Day # 5 L. Cembalo. Regressione con due variabili e ipotesi dell OLS

Multiple Linear Regression

BNAD 276 Lecture 10 Simple Linear Regression Model

Basic Business Statistics 6 th Edition

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

EC4051 Project and Introductory Econometrics

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

3 Multiple Linear Regression

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Math 423/533: The Main Theoretical Topics

Section 3: Simple Linear Regression

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

General Linear Model (Chapter 4)

SIMPLE REGRESSION ANALYSIS. Business Statistics

Lab 07 Introduction to Econometrics

11 Hypothesis Testing

2. Linear regression with multiple regressors

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

A discussion on multiple regression models

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Chapter 3: Multiple Regression. August 14, 2018

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

FinQuiz Notes

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

Inference for the Regression Coefficient

ECON 497 Midterm Spring

Midterm 2 - Solutions

Regression Analysis II

MBF1923 Econometrics Prepared by Dr Khairul Anuar

Hypothesis Testing hypothesis testing approach

Föreläsning /31

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi

Transcription:

, Analisi Statistica per le Imprese Dip. di Economia Politica e Statistica 4.3. 1 / 33

You should be able to:, Underst model building using multiple regression analysis Apply multiple regression analysis to business decision-making situations Analyze interpret the computer output for a multiple regression model Test the signicance of the independent variables in a multiple regression model Incorporate qualitative variables into the regression model by using dummy variables 2 / 33

The systematic part, One may need a mathematical model to quantify the existing relationship between a response variable Y k explicative variables X 1,...,X k y = f (X 1,...X k ) The multiple linear regression model species the functional relationship as f (X 1,...X k ) = β 1 X 1 + β 2 X 2 +...β k X k (1) Geometrically this corresponds to a hyper-plan in k dimensions. The model is extremely useful because: 1 It has an intuitive geometrical interpretation 2 It has a simple estimation of the model's parameters 3 / 33

The stochastic part, The model always includes a rom component, that identies the stochastic component. This can be expressed as follows: Y = f (X 1,...X k ) + }{{}}{{} ε systematic stochastic (2) 4 / 33

The specication Stard notation: for each statistiocal unit i=1...n, Y i = β 1 X i1 + β 2 X i2 +...β k X ik + ε i (3) Matrix notation Y = Xβ + ε (4) Y : (n 1) vector of n dependent variable observations X : (n k) matrix of k regressors with each n observations β : (k 1) vector of k parameters ε : (n 1) vector of n 5 / 33

The Matrices, X 11 X 12 X 1k X 21 X 22 X 2k X =...... ; β = X n1 X n2 X nk Y 1 ε 1 Y 2 y =. ; ε = ε 2. Y n ε n β 1 β 2 The matrix X will have a unitary rst column if the model is with intercept. In this case the intercept would be β 1 in the multidimensional notation. β k 6 / 33

Stard Assumptions, Assumptions The functional relationship must be linear Covariates must have a deterministic nature The X matrix has full rank The error term has a null expected value E [ε i ] = 0 The error term is homosckedastic: Var [ε i ] = σ 2 The error terms are not correlated: Cov[ε i ε j ] i j = 0 7 / 33

OLS estimator, The OLS estimator in multiple linear regression is the vector β that minimize the following function of β, where X i is the i-th row of the X matrix. So min n i=1 e 2 i = min ( Y i X i β) 2 (5) ( e = Y X β ) (6) ( ) 1 β = X X X Y β (7) Notice that the matrix X X (cross product matrix) must be of full rank in order to be invertible. 8 / 33

OLS estimator properties, β is BLUE (Best Linear Unbiased Estimator) ( ( ) 1 ) One can notice that X X X is a matrix of constant elements, therefore β is a linear transformation of Y. One can prove that β is a correct estimator as follows ( ) 1 ( ) β = X 1 ( ) X X Y = X 1 X X (Xβ + ε) = β + X X X ε (8) ] ( ) 1 E [ β = β + X X X E [ε] = β (9) We could also prove that in the class of the linear unbiased estimator is the one presenting the minimum variance. 9 / 33

Variance of the OLS estimator, The variance of the OLS estimator is calculated as follows: ) [ Var ( β = E ( β β )( β ] ) β (10) ) [ ( ) 1 ( ) ] Var ( β = E X 1 X X εε X X X (11) ) ( ) 1 [ Var ( β = X X X E εε ] ( X X X) ( 1 ) 1 ( ) = σ 2 X X X X X X so for each parameter estimator (12) ) Var ( β ( 1 = σ 2 X X) (13) ) ( ) 1 Var ( βj = σ 2 X X jj (14) 10 / 33

Empirical example, A distributor of frozen desert pies wants to evaluate factors that inuence dem Dependent variable: y= Pie sales (units per week) Independent variables: x 1 =Price ($) x 2 =Advertising ($100's) Data is collected for 15 weeks The OLS estimates gives the following estimated model: ŷ = 306 24x 1 + 74x 2 (15) 11 / 33

Interpretation of the estimated coecient, each β j estimates the average value of Y changes by β j units for each 1 unit increase in X j, holding all other independent variables constant example: β1 = 24 then sales (y) are expected to decrease, on average, by 24 pies per week for each $1 increase in selling price (x 1 ), net of the eects due to advertising (x 2 ) β2 = 74 then sales (y) is expected to increase, on example: Intercept average, by 74 pies per week for each $100 increase in advertising (x 2 ), net of the eects due to price (x 1 ) the estimated average value of y when all X variables are zero. 12 / 33

Using the model to make predictions, We can calculate the predicted sales per week given the selling price ($5) advertising expenses ($350): ŷ = 306 24(5) + 74(3.5) = 445 (16) We predict to sell 445 pies per week. 13 / 33

The σ 2 estimator, [ ( e = Y X β ) ] 1 = Xβ + ε X X X X (Xβ + ε) (17) ( ) 1 e = Xβ + ε Xβ X X X X ε (18) ( ( ) 1 ) e = 1 X X X X ε = M x ε (19) M x is a idempotent symmetric matrix. This implies that: M x = M x = M k x; k > 0 (20) 14 / 33

Theσ 2 estimator, we can prove that Q = e e = ε M xm x ε = ε M x ε (21) E [Q] = σ 2 (n k) (22) so the unbiased estimator of the parameter σ 2 is [ ] Q E = σ 2 σ 2 = Q n k n k = e e n k (23) 15 / 33

ANOVA, Adding to the stard assumptions the following The error term has a normal distribution so:ε i N(0,σ 2 ) Reminding that follows β = β 1 β 2 =. β k ( ) 1 X X X Y (24) Y i = N ( Xβ,σ 2) (25) ( ( ) 1 β i = N β i,σ 2 X X ( β ( ) ) 1 = N β,σ 2 X X ii ) (26) (27) 16 / 33

ANOVA, In order to test the meaning of the whole model, we need to test the hypothesis system H 0 : β 1 = β 2 ==...β k = 0 H 1 : almeno un β j 0 the test is F = ESS /(k 1) = R2 /(k 1) F RSS/(n k) (1 R 2 )/n k (k 1,n k) 17 / 33

ANOVA, SS d.f. Mean Square (MS) β X y/(k 1) ESS = β X y = y yr 2 k-1 Residual RSS = e e = y y ( 1 R 2) n-k e e/(n k) Total TSS = y y = y 2 i n-1 Table 1: Construct the F statistic F = ESS /(k 1) RSS/(n k) Find the 95 th or the 99 th quantile of the distribution F (k 1),(n k) If F > F (1 α);(k 1),(n k) one rejects 18 / 33

R 2, R 2 = R 2 = ESS TSS 0 (28) TSS RSS TSS = 1 e2 i Y 2 i 1 (29) 0 R 2 1 (30) 19 / 33

Adjusted R 2, R 2 never decreases when a new X variable is added to the model this can be a disadvantage when comparing dierent models What is the net eect of adding a new variable? we lose a degree of freedom when a new X variable is added did the new X variable add enough explanatory power to oset the loss of one degree of freedom? The adjusted R 2 adjusts for the number of variables (k). R 2 adj = 1 e2 i/(n k) Y 2 i/(n 1) (n 1) ( = 1 1 + R 2 ) (31) (n k) 20 / 33

Empirical example: ANOVA, SS d.f. MS 29460 2 14730 Residual 27033 12 2252 Total 56493 14 Table 2: R 2 = 29460/56493 = 0.52 (32) 52% of the variation in pie sales is explained by the variation in price advertising R 2 adj = 0.44 (33) 44% of the variation in pie sales is explained by the variation in price advertising, taking into account the sample size the number of independent variables 21 / 33

Empirical example: is the model signicant?, F-test for the overall signicance of the model Shows if there is a linear relationship between all of the X variables considered together Y use F test statistic in the estimated model the empirical value of F is equal to 6.54 the critical value of F 0.05;2,12 is equal to 3.88 6.54 > 3.88 therefore the regression model explains a signicant portion of the variation in pie sales (there is evidence that at least one independent variable aects y) 22 / 33

Single parameter t-test, ( ( ) ) 1 β N β,σ 2 X X To test the hypothesis if the individual variable X i has a signicant eect on Y we have to test H 0 :β i =0 (no linear relationship) H 1 :β i 0 (linear relationship does exist between X i Y) if σ 2 is known, under H 0 : (34) β i 0 σ 2 (X X) 1 N (0,1) (35) ii 23 / 33

Single parameter t-test, Generally σ 2 is unknown we have to use its estimator σ 2 = e e n k Therefore the Stard Errors of β is (36) se β = σ 2 (X X) 1 (37) 24 / 33

Single parameter t-test, under H 0 we have: β i 0 σ 2 (X X) 1 ii e e σ 2 /(n k) = β i se β t n k (38) Find the 95 th or the 99 th quantile of the distribution t (n k) if t > t (1 α/2);(n k) one rejects H 0 25 / 33

Empirical example: are individual variables signicant?, coecients Stard Error t Intercept 306 114.25 2.67 Price -24 10.83-2.21 Advertising 74 25.96 2.85 Table 3: At α =0.05 signicant level, the t-value for price is -2.21 >t (α/2;12) = 2.1788 so we refuse H 0 At α =0.05 signicant level, the t-value for advertising is 2.85 >t (α/2;12) =2.1788 so we refuse H 0 There is evidence that both Price Advertising aect pie sales at α =0.05 26 / 33

, Until now we have assumed that in Y = Xβ + u, X are cardinal variables. One can also use categorial explanatory (i.e. dummy) variables that identify specic factors depending on categories: Temporal eects Spacial eects Qualitative variables We suppose that the Dummies inuence just the model intercept (not the slopes). There will be dierent regression interecepts corresponding to the dierent groups/situations, if the dummy variable is signicant. 27 / 33

, Lets assume the following generic model: Ŷ = β 0 + β 1 X 1 + β 2 X 2 (39) Where Y is the pie sale, X 1 is the price X 2 is a holiday indicator function, that will be equal to 1 when a holiday has occurred during the week, equal to 0 when there is no holiday during the week. 28 / 33

, Ŷ = β 0 + β 1 X 1 + β 2 (1) = (β 0 + β 2 ) + β 1 X 1 Ŷ = β 0 + β 1 X 1 + β 2 (0) = { (β 0 ) }} { + β 1 {}}{ X 1 different intercept same slope (40) y sales Holiday No Holiday If H 0 : β 2 = 0 is rejected Holiday has a signicant eect on pie sales x 1 price 29 / 33

, Example: Sales = 300 30(Price) + 15(Holiday) (41) Sales = number of pies sold per week Price = pie price in $ { 1 if holiday has occurred Holiday = 0 if holiday has not occurred As we see the dummy coecient β 2 = 15. This implies that on average 15 more pies were sold in weeks with holidays then in weeks without holidays, given the same price. 30 / 33

, The number of dummy variables must always be one less then the number of discriminated levels. Imagine that we are analyzing the house market have three dierent housing levels {ranch, split level, condo}. Example: Ŷ = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 (42) Y = house price X 1 = { square meters 1 if ranch X 2 = β 2 impact of ranch vs. condo 0 if not { 1 if split level X 3 = β 3 split level vs. condo 0 if not 31 / 33

, Suppose the estimated equation is: The we will have as follows: For a condo (x 2 = x 3 = 0): For a ranch (x 3 = 0): Ŷ = 20 + 0.05X 1 + 24X 2 + 15X 3 (43) ŷ = 20 + 0.05X 1 (44) For a split level (x 2 = 0): ŷ = 44 + 0.05X 1 (45) ŷ = 35 + 0.05X 1 (46) 32 / 33

, Pindyck R. Rubinfeld D., 1998 Econometrics s Economic Forecasts (4-th Ed.) De Luca Amedeo,1996 Marketing Bancario e Metodi Statistici Applicati, vol. 1, Franco Angeli. 33 / 33