Multle Regresson Analss Roland Szlág Ph.D. Assocate rofessor
Correlaton descres the strength of a relatonsh, the degree to whch one varale s lnearl related to another Regresson shows us how to determne the nature of a relatonsh etween two or more varales X (or X, X,, X ): known varale(s) / ndeendent varale(s) / redctor(s) Y: unknown varale / deendent varale causal relatonsh: X causes Y to change
Smle Lnear Regresson Model E () We model the relatonsh etween two varales, X and Y as a straght lne. The model contans two arameters: an ntercet arameter, a sloe arameter. Y = β 0 + β + ε where: deendent or resonse varale (the varale we wsh to elan or redct) ndeendent or redctor varale ε random error comonent β 0 = -ntercet β = sloe β 0 -ntercet of the lne,.e. ont at whch the lne ntercet the -as β sloe of the lne Y = determnstc comonent + random error
Determnstc comonent ŷ = 0 + Random error = determnstc comonent + random error We alwas assume that the mean value of the random error equals 0 the mean value of equals the determnstc comonent. It s ossle to fnd man lnes for whch the sum of the errors s equal to 0, ut there s one (and onl one) lne for whch the SSE (sum of squares of the errors) s a mnmum: least squares lne / regresson lne.
The method of least squares gves us the est lnear unased estmators (BLUE) of the regresson arameters, β 0, β. The least-squares estmators: 0 estmates β 0 estmates β Calculaton of the estmators: f n 0 0, mn! The regresson lne: Ŷ = 0 +
Least Square Methode Where tha artal dervaton s equal to 0 f f 0 0 0 The normal equatons (wth ) Σ = n 0 + Σ Σ = 0 Σ + Σ The estmated regresson lne: ŷ = 0 + 0 0
Multle Lnear Regresson Model The multle lnear regresson lne descres the relaton etween the ndeendent varales (X, X,, X ) and the deendent varale.. Y deends on: X, X,, X ( ndeendent varales) the error term (ε) β 0, β,, β regresson coeffcents.. Y = β 0 + β X + β X + + β X +ε Y = determnstc comonent + random error
Least Squares Method The method of least squares gves us the est lnear unased estmators (BLUE) of the regresson arameters (β 0, β, β, β ) f ( ; ; ;... ;) (... 0 0 ) mn ŷ 0...
9 Data Structure of Multle Lnear Regresson n n n n X 0
0 Multle Lnear Regresson mn )... ( ;) ;... ; ; ( 0 0 f 0 0 0 0............ n
The equaton sstem wth matrces oeraton : n 0............ X X X T T
The equaton sstem wth matrces oeraton: X T X T X X Wth the hel of ths results we can gve the estmaton of the regresson equaton. (the emrcal regresson equaton; the samle model) T X X T
Interretaton of Parameters ŷ 0... The ntercet ont ( 0 ) can e nterreted as the value ou would redct for the deendent varale f ever X = 0. The nterretaton on one hand deends on whether the 0 s art of X values or not, and on the other hand, whether the 0 s art of Y values or not.
Interretaton of Parameters ŷ 0... In a geometrcal sense, coeffcent s the sloe of the regresson lne, thus t shows unt average changng n the deendent varale for each one-unt dfference (ncreasng) n X, f the other ndeendent varales reman constant.
Resdual varale n n n e e e ˆ ˆ ˆ ˆ ˆ S = + S e Sum of square of Y Sum of squares elaned regresson Sum of squares of the errors S ˆ
Analss of Varance n Regresson Analss Sum of Squares Df Mean Sum of Squares Regresson S = (ŷ MSR=SSR/ ŷ ) Resdual S = ( ŷ n-- MSE=SSE/(n--) e ) Total S = ( ) n- F = S e F S ŷ / /(n - -) n = S S ˆ S n n ) (ŷ ) + ( ŷ) = = ( e
Model Testng : H 0 : H : j 0. 0 Pr H : Pr F n SSR SSE H 0 ; ) F 0 F ( ; ) F ( ; ) F 0
Parameter testng If t calculated <t crtcal H 0 If t calculated >t crtcal H 0 : 0 : 0 H H e v s s( = t ) n ; t crtcal t
Assumtons of the Multle Lnear Regresson Model Assumtons of the error term The eected value of the error term equals 0 E(ε X, X, X )=0 Constant varance (homoscedastct) Var(ε) = The error term s uncorrelated across oservatons. Normall dstruted error term.
Assumtons of the ndeendent varales Lnear ndeendenc. F values, whch do not change samle samle. There s no scale error. The ndeendent varale s uncorrelated wth the error term.
Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X )=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.
. E(ε X, X, X )=0 The assumton means, that the resdual should e neutral. If the eected 0 value s not vald, ths tendenc would mean that t could e ntegrated nto the determnstc model. If the method of estmaton for the regresson model s least squares, the average resdual wll e 0.
Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X )=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.
. Homoscedastct (Var(ε) = ) the varance of the error term s the same for all oservatons. Testng: o o Plots of resduals versus ndeendent varales (or redcted value ŷ or tme) Statstc tests Goldfeld-Quandt test, (Esecall when the hetescedastct s related to one of the ndeendent varales.)
Grahcal tests for homoscedastct e e e ŷ ŷ ŷ Homoscedastc resduals Heteroscedastc resduals e resdual
Goldfeld-Quandt test H 0 : j = H : j Stes:. Rankng: sort cases varale.. Sugrous:, (where r > 0, > ) 3. Calculatng the mean square errors (s e ) from the seareted regressons on th and 3rd sugrous 4. F-test: F e e n - r n - r ; r; s s n-r n - r H 0 n r F (α/) F (-α/); ν,ν
Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X)=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.
The error term s uncorrelated across oservatons In case of cross-sectonal, data the oservatons meet the assumton of smle random samlng, thus we do not have to test ths hothess. efore makng estmatons accordng to tme seres data, we need to determne the resdual autocorrelaton.
Causes of autocorrelaton f we dd not use ever mortant descrtve varales n the model (we can t recognse the effect, no data, short tme seres) f the model secfcaton s wrong.e.: the relatonsh s not lnear, ut we use lnear regresson not random scalng errors
Plots to detect autocorrelaton e t e t Indeendent varale there s no n the equaton. e t- e t- e We sholud to use other te of functon. t
The Durn-Watson test H 0 : ρ = 0 no autocorrelaton H : ρ 0 autocorrelaton +volatoró autocorrelaton - volator autocorrelaton 0 d l d u 4-d u 4-d l 4 No rolem d n t Lmts: ( e t n t e e t ) t 0 d 4 Postve autocorrelaton: 0 d Negatve autocorrelaton : d 4 Weaker rolem: no decson Use more varale Use larger dataase
A Durn-Watson róa döntés tálázata H Accet H 0 :=0 Reject >0 Postve autocorrelaton <0 Negatve autocorrelaton No decson d>d u d<d l d l <d<d u d<4-d u d>4-d l 4-d l <d<4-d u Source: Kerékgártó-Mundruczó [999]
Assumtons of the error term. The eected value of the error term equals 0 E(ε X, X, X)=0. Constant varance (homoscedastct) Var(ε) = 3. The error term s uncorrelated across oservatons. 4. Normall dstruted error term.
Normall dstruted errors Testng: Plots Quanttatve tests- Goodness-of-ft tests Ch square test Kolmogorov-Smrnoff test
Grahcal testng e z A lot of the values of the resduals aganst normal dstruted values. The assumton s not volated when the fgure s nearl lnear.
Hstogram of resduals
Goodness-of-ft test H 0 : P r (ε j ) = P j (the dstruton s normal) H : J j : P r (ε j ) P j r ( f ) np np H 0 ( ),( r )
Assumtons of the ndeendent varales. Lnear ndeendenc. (the ndeendent varales should not e an eact lnear comnaton of other ndeendent varales). F values, whch do not change samle samle. 3. There s no scale error. 4. The ndeendent varale s uncorrelated wth the error term.
Multcollneart Testng: X j =f(x, X,,X j-, X j+,,x ) regresson models: Multle determnaton coeffcent F-test(F>F krt ) VIF- ndcator
VIF-mutató Varance Inflaton Factor VIF VIF= wth the others) VIF f R j =0 (jth ndeendent varale doesn t correlate R j = (jth ndeendent varale s an eact lnear comnaton of other ndeendent varales) VIF VIF - weak multcollneart 5 j R VIF 5 - strong dsturng multcollneart VIF - ver strong, harmful multcollneart j
Correcton for Multcollneart We should fnd the offendng ndeendent varales to eclude them. We can comne ndeendent varales whch are strongl (creatng rncle comonents), whch wll dffer from the orgnal ndeendents, ut t wll contan the nformaton content of the orgnal ones.
Thanks for our attenton!