Multiple Regression Analysis

Similar documents
Regression. The Simple Linear Regression Model

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Chapter 14 Simple Linear Regression

Statistics for Economics & Business

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Basic Business Statistics, 10/e

Statistics MINITAB - Lab 2

Assumptions of the error term, assumptions of the independent variables

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Statistics for Business and Economics

a. (All your answers should be in the letter!

Chapter 9: Statistical Inference and the Relationship between Two Variables

+, where 0 x N - n. k k

The Ordinary Least Squares (OLS) Estimator

Chapter 11: Simple Linear Regression and Correlation

Comparison of Regression Lines

17 - LINEAR REGRESSION II

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Regression Analysis. Regression Analysis

Logistic regression with one predictor. STK4900/ Lecture 7. Program

Scatter Plot x

PubH 7405: REGRESSION ANALYSIS SLR: PARAMETER ESTIMATION

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

SIMPLE LINEAR REGRESSION

e i is a random error

Applied Statistics and Probability for Engineers, 6 th edition Xy

Lecture 6: Introduction to Linear Regression

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

Chapter 13: Multiple Regression

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

F8: Heteroscedasticity

Independent Component Analysis

STAT 3008 Applied Regression Analysis

18. SIMPLE LINEAR REGRESSION III

Negative Binomial Regression

28. SIMPLE LINEAR REGRESSION III

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Learning Objectives for Chapter 11

Dependent variable for case i with variance σ 2 g i. Number of distinct cases. Number of independent variables

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 15 - Multiple Regression

Numerical Methods. ME Mechanical Lab I. Mechanical Engineering ME Lab I

Pattern Classification (II) 杜俊

x i1 =1 for all i (the constant ).

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Correlation and Regression

Complete Variance Decomposition Methods. Cédric J. Sallaberry

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Exam. Econometrics - Exam 1

Measuring the Strength of Association

Kernel Methods and SVMs Extension

Modeling and Simulation NETW 707

Multiple Linear Regression and the General Linear Model

Hydrological statistics. Hydrological statistics and extremes

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Systems of Equations (SUR, GMM, and 3SLS)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Introduction to Regression

Lecture 3 Stat102, Spring 2007

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

CENTROID (AĞIRLIK MERKEZİ )

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Chapter 15 Student Lecture Notes 15-1

Biostatistics 360 F&t Tests and Intervals in Regression 1

Topic 7: Analysis of Variance

/ n ) are compared. The logic is: if the two

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

Statistics Chapter 4

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Diagnostics in Poisson Regression. Models - Residual Analysis

LINEAR REGRESSION MODELS W4315

Maximum Likelihood Estimation and Binary Dependent Variables

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

CENTROID (AĞIRLIK MERKEZİ )

Digital PI Controller Equations

Confidence intervals for weighted polynomial calibrations

Homework Assignment 3 Due in class, Thursday October 15

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

III. Econometric Methodology Regression Analysis

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Two-factor model. Statistical Models. Least Squares estimation in LM two-factor model. Rats

Transcription:

Multle Regresson Analss Roland Szlág Ph.D. Assocate rofessor

Correlaton descres the strength of a relatonsh, the degree to whch one varale s lnearl related to another Regresson shows us how to determne the nature of a relatonsh etween two or more varales X (or X, X,, X ): known varale(s) / ndeendent varale(s) / redctor(s) Y: unknown varale / deendent varale causal relatonsh: X causes Y to change

Smle Lnear Regresson Model E () We model the relatonsh etween two varales, X and Y as a straght lne. The model contans two arameters: an ntercet arameter, a sloe arameter. Y = β 0 + β + ε where: deendent or resonse varale (the varale we wsh to elan or redct) ndeendent or redctor varale ε random error comonent β 0 = -ntercet β = sloe β 0 -ntercet of the lne,.e. ont at whch the lne ntercet the -as β sloe of the lne Y = determnstc comonent + random error

Determnstc comonent ŷ = 0 + Random error = determnstc comonent + random error We alwas assume that the mean value of the random error equals 0 the mean value of equals the determnstc comonent. It s ossle to fnd man lnes for whch the sum of the errors s equal to 0, ut there s one (and onl one) lne for whch the SSE (sum of squares of the errors) s a mnmum: least squares lne / regresson lne.

The method of least squares gves us the est lnear unased estmators (BLUE) of the regresson arameters, β 0, β. The least-squares estmators: 0 estmates β 0 estmates β Calculaton of the estmators: f n 0 0, mn! The regresson lne: Ŷ = 0 +

Least Square Methode Where tha artal dervaton s equal to 0 f f 0 0 0 The normal equatons (wth ) Σ = n 0 + Σ Σ = 0 Σ + Σ The estmated regresson lne: ŷ = 0 + 0 0

Multle Lnear Regresson Model The multle lnear regresson lne descres the relaton etween the ndeendent varales (X, X,, X ) and the deendent varale.. Y deends on: X, X,, X ( ndeendent varales) the error term (ε) β 0, β,, β regresson coeffcents.. Y = β 0 + β X + β X + + β X +ε Y = determnstc comonent + random error

Least Squares Method The method of least squares gves us the est lnear unased estmators (BLUE) of the regresson arameters (β 0, β, β, β ) f ( ; ; ;... ;) (... 0 0 ) mn ŷ 0...

9 Data Structure of Multle Lnear Regresson n n n n X 0

0 Multle Lnear Regresson mn )... ( ;) ;... ; ; ( 0 0 f 0 0 0 0............ n

The equaton sstem wth matrces oeraton : n 0............ X X X T T

The equaton sstem wth matrces oeraton: X T X T X X Wth the hel of ths results we can gve the estmaton of the regresson equaton. (the emrcal regresson equaton; the samle model) T X X T

Interretaton of Parameters ŷ 0... The ntercet ont ( 0 ) can e nterreted as the value ou would redct for the deendent varale f ever X = 0. The nterretaton on one hand deends on whether the 0 s art of X values or not, and on the other hand, whether the 0 s art of Y values or not.

Interretaton of Parameters ŷ 0... In a geometrcal sense, coeffcent s the sloe of the regresson lne, thus t shows unt average changng n the deendent varale for each one-unt dfference (ncreasng) n X, f the other ndeendent varales reman constant.

Resdual varale n n n e e e ˆ ˆ ˆ ˆ ˆ S = + S e Sum of square of Y Sum of squares elaned regresson Sum of squares of the errors S ˆ

Analss of Varance n Regresson Analss Sum of Squares Df Mean Sum of Squares Regresson S = (ŷ MSR=SSR/ ŷ ) Resdual S = ( ŷ n-- MSE=SSE/(n--) e ) Total S = ( ) n- F = S e F S ŷ / /(n - -) n = S S ˆ S n n ) (ŷ ) + ( ŷ) = = ( e

Model Testng : H 0 : H : j 0. 0 Pr H : Pr F n SSR SSE H 0 ; ) F 0 F ( ; ) F ( ; ) F 0

Parameter testng If t calculated <t crtcal H 0 If t calculated >t crtcal H 0 : 0 : 0 H H e v s s( = t ) n ; t crtcal t

Assumtons of the Multle Lnear Regresson Model Assumtons of the error term The eected value of the error term equals 0 E(ε X, X, X )=0 Constant varance (homoscedastct) Var(ε) = The error term s uncorrelated across oservatons. Normall dstruted error term.

Assumtons of the ndeendent varales Lnear ndeendenc. F values, whch do not change samle samle. There s no scale error. The ndeendent varale s uncorrelated wth the error term.

Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X )=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.

. E(ε X, X, X )=0 The assumton means, that the resdual should e neutral. If the eected 0 value s not vald, ths tendenc would mean that t could e ntegrated nto the determnstc model. If the method of estmaton for the regresson model s least squares, the average resdual wll e 0.

Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X )=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.

. Homoscedastct (Var(ε) = ) the varance of the error term s the same for all oservatons. Testng: o o Plots of resduals versus ndeendent varales (or redcted value ŷ or tme) Statstc tests Goldfeld-Quandt test, (Esecall when the hetescedastct s related to one of the ndeendent varales.)

Grahcal tests for homoscedastct e e e ŷ ŷ ŷ Homoscedastc resduals Heteroscedastc resduals e resdual

Goldfeld-Quandt test H 0 : j = H : j Stes:. Rankng: sort cases varale.. Sugrous:, (where r > 0, > ) 3. Calculatng the mean square errors (s e ) from the seareted regressons on th and 3rd sugrous 4. F-test: F e e n - r n - r ; r; s s n-r n - r H 0 n r F (α/) F (-α/); ν,ν

Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X)=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.

The error term s uncorrelated across oservatons In case of cross-sectonal, data the oservatons meet the assumton of smle random samlng, thus we do not have to test ths hothess. efore makng estmatons accordng to tme seres data, we need to determne the resdual autocorrelaton.

Causes of autocorrelaton f we dd not use ever mortant descrtve varales n the model (we can t recognse the effect, no data, short tme seres) f the model secfcaton s wrong.e.: the relatonsh s not lnear, ut we use lnear regresson not random scalng errors

Plots to detect autocorrelaton e t e t Indeendent varale there s no n the equaton. e t- e t- e We sholud to use other te of functon. t

The Durn-Watson test H 0 : ρ = 0 no autocorrelaton H : ρ 0 autocorrelaton +volatoró autocorrelaton - volator autocorrelaton 0 d l d u 4-d u 4-d l 4 No rolem d n t Lmts: ( e t n t e e t ) t 0 d 4 Postve autocorrelaton: 0 d Negatve autocorrelaton : d 4 Weaker rolem: no decson Use more varale Use larger dataase

A Durn-Watson róa döntés tálázata H Accet H 0 :=0 Reject >0 Postve autocorrelaton <0 Negatve autocorrelaton No decson d>d u d<d l d l <d<d u d<4-d u d>4-d l 4-d l <d<4-d u Source: Kerékgártó-Mundruczó [999]

Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X)=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.

Normall dstruted errors Testng: Plots Quanttatve tests- Goodness-of-ft tests Ch square test Kolmogorov-Smrnoff test

Grahcal testng e z A lot of the values of the resduals aganst normal dstruted values. The assumton s not volated when the fgure s nearl lnear.

Hstogram of resduals

Goodness-of-ft test H 0 : P r (ε j ) = P j (the dstruton s normal) H : J j : P r (ε j ) P j r ( f ) np np H 0 ( ),( r )

Assumtons of the ndeendent varales. Lnear ndeendenc. (the ndeendent varales should not e an eact lnear comnaton of other ndeendent varales). F values, whch do not change samle samle. 3. There s no scale error. 4. The ndeendent varale s uncorrelated wth the error term.

Multcollneart Testng: X j =f(x, X,,X j-, X j+,,x ) regresson models: Multle determnaton coeffcent F-test(F>F krt ) VIF- ndcator

VIF-mutató Varance Inflaton Factor VIF VIF= wth the others) VIF f R j =0 (jth ndeendent varale doesn t correlate R j = (jth ndeendent varale s an eact lnear comnaton of other ndeendent varales) VIF VIF - weak multcollneart 5 j R VIF 5 - strong dsturng multcollneart VIF - ver strong, harmful multcollneart j

Correcton for Multcollneart We should fnd the offendng ndeendent varales to eclude them. We can comne ndeendent varales whch are strongl (creatng rncle comonents), whch wll dffer from the orgnal ndeendents, ut t wll contan the nformaton content of the orgnal ones.

Thanks for our attenton!